Available for opportunities

Data Engineer
& ML Specialist

Electrical & Computer Engineer with Integrated Master's in Telecommunications & Information Systems. Transforming raw data into actionable insights through robust pipelines and intelligent systems.

Python SQL Machine Learning Data Pipelines
efstratios@portfolio:~

$ whoami

Efstratios Demertzoglou - Electrical and Computer Engineer

$ cat education.txt

MSc Electrical & Computer Engineering
Specialization: Telecommunications & Information Systems

$ ls skills/

data_pipelines/ machine_learning/ etl_processes/
signal_processing/ cloud_computing/ statistical_analysis/

$ python analyze_career.py

✓ Data pipelines optimized
✓ Models deployed successfully
✓ Insights generated_

Technical Expertise

Bridging telecommunications theory with modern data engineering practices. Specialized in building end-to-end data solutions that scale.

Data Engineering

  • ETL Pipeline Development
  • Data Warehousing & Modeling
  • Real-time Data Processing
  • Database Optimization (SQL/NoSQL)

Machine Learning

  • Predictive Modeling & Analytics
  • Natural Language Processing
  • Recommendation Systems
  • Model Deployment & MLOps

Telecommunications

  • Signal Processing & Analysis
  • Information Systems Design
  • Network Data Analytics
  • System Architecture

Technologies I Work With

Python Pandas NumPy Scikit-learn TensorFlow SQL PostgreSQL MongoDB Hadoop Apache Airflow Docker AWS Git Linux MATLAB Tableau

Featured Projects

End-to-end data solutions showcasing distributed computing, hybrid database architectures, and production-grade machine learning pipelines.

Hybrid SQL/NoSQL Data Integration

F1 Hybrid Analytics Study

Novel integration of structured F1 race data (PostgreSQL) with unstructured radio messages (MongoDB) enabling cross-domain analytics. Synthetic message generation preserves causal relationships with actual race events for validated communication pattern analysis.

Engineered ETL pipeline joining PostgreSQL and MongoDB with Python (pandas, psycopg2, pymongo)
Implemented causal validation: pit messages linked to actual pit stops, retirements to DNF events
Tableau dashboards for interactive exploration by team, circuit, and message type
hybrid_query.py
def analyze_radio_patterns():
    # Cross-database analytics
    sql_data = query_postgresql(race_events)
    mongo_data = query_mongodb(radio_msgs)
    return correlate_events(
        sql_data, 
        mongo_data
    )
PostgreSQL
Structured
MongoDB
Unstructured
Business Rating RMSE 0.36
R² Score
0.86
ROC-AUC
0.97
Review Accuracy 71.6%
3.36M reviews processed
NLP & ML Pipeline Explainable AI

Yelp Rating Prediction Engine

Production-grade ML system predicting business ratings and individual review scores using TF-IDF features, LightGBM, and logistic regression. Includes token-level explainability and OOD testing across geographic regions (PA/TN vs FL).

Achieved RMSE 0.36 (R² 0.86) on business rating regression using Ridge + TF-IDF
Built explainability layer with token-level attributions for linear text models
Validated OOD robustness: Florida held-out test showed parity with in-distribution performance
Live Demo: Yelp Rating Prediction
Big Data Hadoop MapReduce

Distributed Movie Rating Analysis

End-to-end Hadoop MapReduce pipeline processing MovieLens dataset to compute average ratings by genre and age group. Implements multi-stage distributed joins across ratings, users, and movies using Java and HDFS on WSL environment.

Architected 3-stage MapReduce pipeline: Ratings-Users join → Movies-Genre join → Genre-Age aggregation
Configured Hadoop 3.4.2 ecosystem (HDFS, YARN) on Windows Subsystem for Linux
Maven build automation with Java 11 for reproducible JAR deployment
MapReduce Pipeline
1
RatingsUsersJoin
MapReduce Join
2
MoviesGenreJoin
MapReduce Join
3
GenreAgeAvg
Aggregation
Output Format
Genre | AgeGroup | AvgRating
Real-time Communication WebSockets

WebSocket Platform

Lightweight sandbox project for persistent client-server communication with WebSockets, using Supabase for backend services and real-time data synchronization. Built to validate connection lifecycle behavior, message broadcasting, and low-latency updates.

Implements bidirectional communication over persistent WebSocket channels
Useful for debugging connection stability, reconnects, and event flow in real time
Provides a clean baseline for experimenting with live updates in web applications
websocket_flow.log
Client connected → ws://server
Handshake complete (101)
Event: message_received
Event: broadcast_sent
Heartbeat ping/pong OK
Learning Highlights
Vocabulary
Daily terms
Practice
Interactive flow
UI Focus
Learner friendly
Progress
Habit tracking
Education App Language Learning

Serbian Buddy App

Language-learning application designed to support Serbian practice through a simple, approachable interface. Focuses on structured learning flow and repeatable exercises that help users build vocabulary and confidence over time.

Promotes consistent Serbian language practice with compact lesson flows
Built with UX simplicity in mind to make daily learning easy to maintain
Acts as a foundation for adding quizzes, phrase banks, and pronunciation tools

Let's Build Data Solutions Together

Looking for Data Engineering or Software Engineering opportunities. Let's discuss how my background in telecommunications and data systems can add value to your team.