Data Analytics & Business Intelligence Portfolio
Developed interactive Tableau dashboards to analyze sales, profitability, and customer segmentation across multiple regions.
• Built dashboards for sales analysis in the USA and EU.
• Created visualizations to analyze customer segmentation in the UK.
• Designed charts and maps to highlight trends and performance.
Technologies: Tableau, SQL, Excel.



Built an end-to-end analytics pipeline on Google Cloud to generate, transform, and load data into BigQuery and support reporting and analysis.
• Generated synthetic data using Python.
• Built ETL pipelines using Google Data Fusion.
• Loaded data into BigQuery and performed SQL queries.
• Created dashboards in Looker for reporting.
Technologies: Python, SQL (BigQuery), Google Data Fusion, Looker.

import csv
from faker import Faker
import pandas as pd
fake = Faker()
def sanitize_text(text):
"""Removes newlines, extra spaces, and ensures proper formatting."""
return text.replace('\n', ' ').replace('\r', ' ').replace(',', ' ').strip()
def generate_employee_data(num_employees=1000):
with open("cleaned_employee_data.csv", "w", newline="", encoding="utf-8") as csvfile:
fieldnames = [
"employee_id", "first_name", "last_name", "email", "phone_number", "address", "birthdate",
"hire_date", "job_title", "department", "salary", "password"
]

Performed data preparation and built a predictive analytics model using PySpark to identify patterns in laboratory data.
• Cleaned and transformed data.
• Performed feature engineering.
• Built and evaluated a predictive model.
Technologies: PySpark, Python, Machine Learning (Linear Regression), Pandas, NumPy, Matplotlib.
!pip install pyspark
# Load our Pkgs
from pyspark import SparkContext
# Spark
spark = SparkSession.builder.appName("MLwithSpark").getOrCreate()
# Load our dataset
df = spark.read.csv("/content/drive/MyDrive/Colab Notebooks/Data/hcvdata.csv",header=True,inferSchema=True)
train_df,test_df = vec_df.randomSplit([0.7,0.3])
from pyspark.ml.classification import LogisticRegression,DecisionTreeClassifier
# Logist Model
lr = LogisticRegression(featuresCol='features',labelCol='Target')
lr_model = lr.fit(train_df)
y_pred = lr_model.transform(test_df)
y_pred.show()
y_pred.select('target','rawPrediction', 'probability', 'prediction').show()
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
# How to Check For Accuracy
multi_evaluator = MulticlassClassificationEvaluator(labelCol='Target',metricName='accuracy')
multi_evaluator.evaluate(y_pred)
from pyspark.mllib.evaluation import MulticlassMetrics
lr_metric = MulticlassMetrics(y_pred['target', 'prediction'].rdd)
print("Accuracy",lr_metric.accuracy)
print("Precision",lr_metric.precision(1.0))
print("Recall",lr_metric.recall(1.0))
print("F1Score",lr_metric.fMeasure(1.0))
Collected social media data using APIs and performed text analytics and sentiment analysis to identify trends.
• Extracted and cleaned text data.
• Classified sentiment as positive, negative, or neutral.
• Visualized results.
Technologies: Python, Tweepy, TextBlob, Pandas, Matplotlib.
auth = tweepy.OAuthHandler(consumerKey, consumerSecret)
auth.set_access_token(accessToken, accessTokenSecret)
api = tweepy.API(auth)
for tweet in tweets:
#print(tweet.text)
tweet_list.append(tweet.text)
analysis = TextBlob(tweet.text)
score = SentimentIntensityAnalyzer().polarity_scores(tweet.text)
neg = score['neg']
neu = score['neu']
pos = score['pos']
comp = score['compound']
polarity += analysis.sentiment.polarity
tw_list_negative = tw_list[tw_list["sentiment"]=="negative"]
tw_list_positive = tw_list[tw_list["sentiment"]=="positive"]
tw_list_neutral = tw_list[tw_list["sentiment"]=="neutral"]


