This past February, I was lucky enough to get selected as a data science intern. It was my first internship experience. I was so excited to experience the professional world, eager to learn new things that will prepare me to become a successful data scientist, but also worried about a few things. Now, after my internship, I felt that It was the best decision I have made. I felt grateful for the ways this internship opportunity expanded my learnings, enhanced my personal and professional skills, and facilitated me to apply my academic knowledge, ideas, and theories to solve real-world problems. This essay reflects the abundance of learning opportunities, skills acquired, and experiences gained throughout my internship.
Data science is one of the most evolving fields of the 21st century. The advancement in technology led to producing a massive amount of data. Data is present everywhere and is the most valuable asset to an organization. As data-driven decisions enable businesses and organizations to increase their profitability, improve their operational efficiency, enhance performance, resolve future challenges, and derive several ways for future development. The numerous application of the data science field made it the demand of every small and big organization. The growing demand produces an abundance of job opportunities in multiple sectors and presents a way to work with different organizations such as the Retail industry, Health sector, Banking, E-commerce, Airline industry, etc. Data Science also requires strong expertise in domain knowledge, programming skills, and mathematics that produced a considerable gap between the demand and supply of data scientists. Data scientist knowing these various disciplines has a lot of opportunities. The ease of hunting jobs, growing demand, highly paid career, and competitive field attracted me towards the field of Data Science.
I was always enthralled with the range of data science real-world applications. I wanted to acquire skills and knowledge that led me towards a successful data scientist. I got admitted as a Master of Data Science student at the University of Melbourne after completing my undergrad in Computer Science. As I was from a computer science background, I had extensive knowledge about different programming languages, but to gain knowledge of the statistical and mathematical concepts, I entered into this course. As data science is a multidisciplinary field that requires expertise in various disciplines such as mathematics, computer science, and statistics. Being a data science student, I studied various subjects such as Statistical machine learning, advanced database management system, knowledge technologies, and various statistical subjects that build my statistical foundation as well as enhanced my programming skills for this field. After one year of my course, I wanted to get hands-on experience on a real-time project, to apply my academic knowledge in a real work environment, analyze my strength and weakness, and to get learn how data science fits into the real world. To get exposure to the corporate world, I started applying for an internship via my university portal and got a wonderful opportunity at Doherty institute.
Working at Doherty institute was an inspiring and rich experience. This organization is a joint venture between The University of Melbourne and The Royal Melbourne hospital. The vision of the Doherty Institute is to improve health globally through discovery research and the prevention, treatment, and cure of infectious diseases. I worked under the department of epidemiology. Organization’s easy-going culture, commitment towards work, and how people were enjoying and involved in their work amazed me the most. Reflecting on the first day of the internship, my supervisor introduced me to the different members of the department, gave me a brief overview of the project, different tasks, and the timeline to complete the project. I also had a workplace tour on the first day. My supervisor was happy for me to contribute as much as I can and was open to any ideas and suggestions. This made me feel more comfortable at work.
The best part of my internship was the project I worked upon, it was an ongoing research project that explores the veterinary practice, diagnosis, and treatment prescription in Bhutan. The goal of this project was to analyze how each syndrome is treated differently in different hospitals. I was intended to work with data from one of the Bhutanese District Veterinary Hospitals and performed various tasks such as data entry, data cleaning, data exploration, data analysis, and data visualization. The thing that excited me most about this project was that it relates to the healthcare sector that was completely a new area, I would get a chance to explore.
To produce great data insights every data science project requires the expertise of the domain knowledge. As I was not familiar with the healthcare sector, asking questions from my supervisor, reading relevant literature from online portals helped me to achieve a clear understanding of the domain and the business problem. The first phase of the project was boring data entry, but it made me familiar with various terms of veterinary infections, diagnosis, and treatments. It also enhanced my excel skills that I rarely worked upon in the university. After having a deep understanding of the domain and completing the data entry part, the next step was the pre-processing of data. Data pre-processing is a data mining technique that is used to transform the raw data into an understandable format. As incorrect and inconsistent data result in producing wrong data insights. I have applied various statistical methods that I have learned in university to identify the missing values, outliers, and duplicates to increase the correctness and consistency of the data. The next step was to aggregate the data based on certain families of the treatments. My supervisor provided me with the list of families and treatments that come under it. To accomplish this, I wrote a lot of regular expressions. Regular expression was used to extract meaningful information from the data by using the string-matching pattern. I have learned the theoretical concept of regular expressions in my knowledge technologies subjects and applying them here helped me to complete this task efficiently and also enhanced my skills.
After the data entry and data cleaning the next step was to explore the data, perform statistical analysis, and visualize the data. I was introduced to a completely new package in R Studio that is tidyverse. Tidyverse package provides various functionalities such as dplyr, ggplot2 for visualization, filter() to filter the data, summarise() to summarize data based on different groups for example summarizing data based on different species. The data was then visualized by using the ggplot2 functionality of the tidyverse and the results were being reported to my supervisor. I enjoyed the initial days as applying what I learned in the university to this real-world project boost up my confidence. Furthermore, the knowledge I gained at the university such as statistical methods, visualization techniques, and different libraries I have worked upon, helped me to produce better results. Over the course of my internship, I have also learned new libraries such as tidyverse in R and became familiar with data visualization software such as Tableau that enhanced my skills for my career.
I have also learned about another important aspect that is data privacy, and how it plays a vital role in the field of data science. As data is a core component of industry and used to produce various meaningful insights but it can be misused hence it is important to protect it. As the clients provide data with certain terms and conditions. I have to delete all the data from the system at the end of the day. I cannot transfer the dataset to my system because of data privacy issues. This made me realize the importance of data more. It became a problem when the COVID-19 happened, and I was asked to work from home due to lockdown, so I took the partial data over the home for further reporting.
Some of the challenges I faced during the internship was to make reasoned judgments and find solutions daily independently. At the start of my internship, I face some challenges with problem-solving like whenever I was unable to implement a method, I always tempted to ask questions rather than solving it by myself, working more on the problem and going through each line of code increased my understanding and with the time problem-solving became more enjoyable. I also learned that a positive attitude towards facing challenges and a keenness to take personal responsibility when seeking a way around the problem is the best way to solve it.
In addition to technical skills, I have significantly developed other soft skills such as communication, time management, networking, and self-discipline. Communication plays a vital role in the professional world. Client meetings, Everyday meeting with the supervisor, small talk with people from different departments enhanced my verbal communication skills. Along with this, I have also analyzed how crucial is non-verbal communication, for example how body language, eye contact matter while communicating at the workplace. Communicating with different people in the workplace helped me to build the connections. Firstly, I was unable to interact more with different people but over the time I learned to start a conversation with small talk, and then asking questions helped me to know more about their interests and work they were doing. I have also added some of them on my LinkedIn that expanded my network. Having lunch and attending seminars helped me to build a network with other people in the workplace. As I was doing the internship in between the semester, planning, and organizing helped me to manage the workload with my ongoing university subjects and projects. Earlier I just had to attend the lectures and work on assignments but with the internship, I have to go to the workplace. In the first week, it was not hectic but with the time it became hectic. As the workload increased, I managed my priorities by dividing the tasks into different categories and then completing tasks based on their priorities. In this way, I did not miss the deadlines and also delivered the project within the deadline. Sometimes some tasks are more time consuming, to complete them, I dropped the tasks that have low priority. Planning and organizing also helped me to keep a balance between my studies, internship, and personal life.
My experience at the Doherty Institute was phenomenal. Began as a newbie to the corporate world of data science. I was excited to learn new things. I discovered a lot about myself that is how to be patient, stay positive, on what skills I have to work more upon, and also about how data science principles can be applied to solve healthcare challenges. It was a different experience from being a student to working as a professional in the industry. Applying what I learned in university increased my confidence and the skills I acquired at the and knowledge I gained at workplace complimented my learnings and enhanced my skills.