Data Engineer & Big Data Specialist
I am a passionate Data Engineer with experience in designing and implementing data pipelines, developing ETL processes, and creating data warehouse architectures. I have skills in working with modern technologies that ensure the reliability and efficiency of data processing.
I graduated with a Bachelor degree in Applied Mathematics and Informatics and currently work as a Data Analyst at Lime HD. In my current role, I go beyond the standard responsibilities of a data analyst and take on a wide range of tasks characteristic of a Data Engineer:
My experience includes:
With over a year of experience working in an accredited IT company, I am constantly looking for ways to improve processes, automate workflows, and implement new technologies to enhance efficiency. Working with cross-functional teams, I provide analytical insights that help make data-driven decisions and contribute to the growth and success of the organization.
I strive to further develop my expertise in data analysis and engineering, explore new technologies and approaches, and maximize my contribution to creating reliable and scalable solutions. In my understanding, a modern Data Engineer is a technical specialist responsible for the data lifecycle: from collection and processing to providing analytical tools. This role requires knowledge of programming, big data, infrastructure management, and ensuring data quality. This is the direction I aim to grow in, focusing on building efficient, reliable, and scalable solutions that deliver real value to businesses.
Data Cleaning & Processing, ETL, Automation, API Development & Integration (Building RESTful APIs using FastAPI and integrating third-party APIs with requests and aiohttp, Modeling (Predictive and descriptive models), Statistical Analysis, Writing efficient scripts for data manipulation and task automation
Data Pipelines: Building and optimizing ETL/ELT pipelines
Data Warehousing: Designing scalable architectures for the preparation of reports and business analysis to support decision-making in the organization
Creating detailed documentation for data solutions, standardizing workflows and ensuring maintainable processes Versioning and managing documentation using tools like Confluence, Notion, or Git
Designing interactive and real-time dashboards using Grafana, Yandex DataLens, ReDash, etc.; Creating insightful and customizable visualizations (Matplotlib, Plotly, Seaborn, HighCharts JS); Reporting Tools (Generating automated and presentation-ready reports with Pandas Styler, openpyxl, ReportLab)
Relational & Analytics Databases: MySQL, Postgres (Query Optimization, ProxySQL, Backup & Restore) ClickHouse (Columnar Storage, High-Speed OLAP Queries, Distributed Clusters, Real-Time Analytics); NoSQL: Redis(In-Memory Key-Value)
Supervised & Unsupervised Learning, Deep Learning (TensorFlow, Keras), Hyperparameter Tuning, Feature Engineering (Extracting and transforming features), Time-Series Analysis, Ensemble Methods (Combining models with stacking, bagging, and boosting techniques for robust predictions), Model Deployment
Containerization & Orchestration (Docker, docker-compose for building, deploying, and managing containerized applications at scale), CI/CD Integration, Monitoring (Zabbix, Prometheus, Grafana), Version Control (gitlab)
Service Management (systemd), Logs & Diagnostics (journalctl), User Management, Permissions, Process Monitoring (top, htop, ps), Scheduling Tasks (cron, systemd timer), Networking (iptables, ip, netstat), File System Management
Workflow Scheduling & Orchestration (Designing, scheduling, and orchestrating complex data workflows), DAG Management (Building, maintaining, and optimizing Directed Acyclic Graphs (DAGs) for efficient workflow execution), Deployment and Configuring Airflow for distributed execution using Celery
Effectively communicating with analysts to translate business requirements into technical solutions; Collaborating with database administrators to optimize database performance and ensure data integrity; Working with DevOps teams to maintain data infrastructure stability; Partnering with backend developers to design scalable solutions and integrate APIs for seamless data flow; Facilitating cross-team collaboration to align on goals, streamline workflows, and deliver good results
ELK Stack: ElasticSearch (Full-Text Search), Logstash (data ingestion pipelines, parsing logs), Kibana (Data Visualization & Dashboards); Zabbix: Monitoring System Performance, Metrics Collection Prometheus: Collecting and querying metrics using PromQL for application and infrastructure performance monitoring; Grafana: Configuring alerts and thresholds for proactive monitoring
• Designed and implemented full-scale ETL/ELT pipelines for data integration and transformation from various sources, including APIs, databases, and web scraping.
• Built and maintained infrastructure for data processing and monitoring using tools such as Grafana, Prometheus, and ClickHouse.
• Automated data workflows with scripts (Python, Bash) and Docker containers, improving efficiency and reliability of daily operations.
• Developed dashboards to visualize critical technical and business metrics, enabling informed decision-making.
• Conducted exploratory data analysis (EDA) to identify key trends and insights, supporting various business needs.
• Collaborated with cross-functional teams to integrate and standardize data from multiple sources for unified reporting.
• Provided actionable analytics to support decision-making and process optimization within the analytics department.
ChuvSU named after I. N. Ulyanov | 2023 - 2025
ChuvSU named after I. N. Ulyanov | 2019 - 2023
View Certificate | Awarded on: 31 Aug 2022
International Student Scientific Conference on Technical, Humanitarian, and Natural Sciences
Scientific work: "Modeling Short-Term Inflation Forecasts" in the section "Mathematical Models in Economics and Numerical Analysis". | 2023
View Certificate | 2022
Interested in discussing a project, collaboration or hiring me?
Reach out via phone, email or LinkedIn: