Exploring YouTube Data: A Data Driven Approach

Topics:	Data Collection Effects of Social MediaYouTube
Words:	1362
Pages:	3 This essay sample was donated by a student to help the academic community. Papers provided by EduBirdie writers usually outdo students' samples.
Updated:	08.07.2022

Abstract

Video watching had emerged as one of the most frequent media activities on the Internet. Yet, little is known about how users watch online video. Using two distinct YouTube datasets, a set of random YouTube videos crawled from the Web and a set of videos watched by participants tracked by YouTube developer App, This paper examine whether and how indicators of collective preferences and reactions are associated with view duration of videos. This paper also shows that video view duration is positively associated with the video’s view count, the number of likes per view, and the negative sentiment in the comments. These metrics and reactions have a significant predictive power over the duration the video is watched by individuals. Our findings provide a more precise understandings of user engagement with video content in social media beyond view count.

Introduction

Video watching is perhaps the most popular web-based activity, through video hosting and sharing services such as YouTube, Facebook, Netflix, Vimeo, and others. As of 2015, YouTube alone has more than 1 billion viewers every day, watching hundreds of millions of hours of content. It is forecasted that video will represent 80 percent of all traffic by 2019. Yet, little is known about how users engage with and watch online video. We use two distinct datasets from YouTube to investigate how users’ engagement in watching a video (i.e., view duration) is associated with other video metrics such as the number of views, likes, comments, and the sentiment of comments. A number of research efforts have investigated view count as a key indicator of popularity or quality of video—particularly looking at its relationships with other popularity or preference metrics (e.g., the number of likes and comments). For example, the number of comments/favourites/ratings and average rating are significant predictors of video view counts on YouTube; the sequence of comments and its structure are strongly associated with view counts and view counts can be predicted through socially shared viewing behaviours around the content such as how many times a video was rewound or fast-forwarded as well as the duration of the session in a tool that allows people watch videos together in sync and real time. Although views, likes, comments, and other such measures can be considered as indicators of general popularity and preferences, there has been growing interest in using deeper post-click user engagement (e.g., how long a user watched a video) to estimate more accurate relevance and interest and to improve ranking and recommendation. For example, YouTube has started to use ‘dwell time’ (the length of time that a user spends on a video, e.g., video watching session length) instead of click events to better measure the engagement with video content. Beyond video, Facebook is using dwell time on external links to combat Clickbait—stories with arousing headlines that attract users to click and share more than usual, but are not consumed in depth.

Save your time!
We can take care of your essay

Proper editing and formatting
Free revision, title page, and bibliography
Flexible prices and money-back guarantee

Place an order

Data Collection

For the Individual Logs dataset, our view duration dependent variable was computed differently. In this case, we have used an individual, but approximate, view duration measurement. In particular, we used the user’s dwell time for each video on the video’s page, as was measured by the extension, as an approximation for the actual view time for the video by that user. We modelled the data by automating queries and keyword-based searches to gather videos and their corresponding comments. Python scripts using the YouTube APIs were used to extract information about each video (comments and their timestamps). We collected 1000 comments per video (YouTube allows a maximum of 1000 comments per video to be accessed through the APIs), and used keywords like 'Federer', “Nadal', “Obama' etc., to collect the data for specific keywords. The timestamp and author name of each video were also collected. The final dataset used for the sentiment analysis had more than 3000 videos and more than 7 million comments. We performed data pre-processing on the collected comments. YouTube comments comprise of several languages depending on the demography of the commenter. However, to simplify the sentiment analysis, we modified the data collection scripts to collect only English comments. From the collected English comments, only comments in the standardUTF-8 encoding were selected in order to remove comments with unwanted characters. The steps below explain the procedure to collect the comments with their respective timestamps and author names for the keywords specified by the user. In steps 2-4, the Google APIs for YouTube are used to configure the query with the number of videos to be fetched, the language of interest for comments, the search keyword, and how the comments are to be sorted. Step 5 collects the IDs of the videos related to the specified keyword. Steps 6 and 7 collect the comments associated with these videos and extract the timestamps, author names and comment text from the comment entries. All the comments for a single keyword are aggregated into one dataset which is used as the test set as explained in the following:

Step 1: Prompt the user to specify the search keyword (keywords) and number of videos (numVideos)
Step 2: Set maxNumVideos = max(50; numVideos) (As Google limits the maximum number of videos fetched in one iteration to 50)
Step 3: Set up the YouTube client to use the YouTube-Service() API to communicate with the YouTube servers
Step 4: Use the YouTubeVideoQuery() API to set the query parameters like language, search keyword,etc
Step 5: Perform successive queries to get the videoID of each video related to the keyword
Step 6: Collect the comments associated with each videoID using the GetYouTubeVideoCommentFeed() API (maximum limit of comments per video is 1000)
Step 7: Extract the comments with their respective timestamps and author names

From 105 days of observation , it can be seen ,that a particular video trended for 14 days at most. We can see there are 604 videos those had appeared in the YouTube trending video list for only once. One of the interesting point I would like to share, correlation between likes &comment count = 0.71 . And correlation between dislikes & comment_count = 0.83 . So we can claim, that more people involved in conversation when they were disliking a video rather than liking a video. Most of these cases ,video might be controversial or a fake news,etc.

From the above plot we can see, there is a very strong relationship between views & likes. And the value of the correlation between them is 0.82. Since log10 applied on the x-axis & and there are few videos in YouTube trending list with 0 likes, thats why we have to pass the variable (likes+1) instead of likes into the scale_x_log10() function. That would help to overcome infinite values(since log10(0) = Inf). Therefore in the above plot on x-axis scale 1 represent 0. We can see there are many outliers on y-axis for x = 1. Many of those video authors might be disabled video rating ,so users can’t like or dislike the video. Another point to see, after 10^4=10000 likes ,variance of likes decreases as views increases.

References

Alhabash, S.; Baek, J.-h.; Cunningham, C.; and Hagerstrom, A.2015. To Comment or Not to Comment?: How Virality, Arousal Level, and Commenting Behavior on YouTube Videos Affect Civic Behavioral Intentions. Computers in Human Behavior.
Arapakis, I.; Lalmas, M.; Cambazoglu, B. B.; Marcos, M.-C.; and Jose, J. M. 2014. User Engagement in Online News: Under the Scope of Sentiment, Interest, Affect, and Gaze. Journal of the Association for Information Science and Technology.
Baym, N. K. 2013. Data Not Seen: The Uses and Shortcomings of Social Media Metrics. First Monday.
Berger, J., and Milkman, K. L. 2012. What Makes Online Content Viral? Journal of Marketing Research.
Chatzopoulou, G.; Sheng, C.; and Faloutsos, M. 2010. A First Step Towards Understanding Popularity in YouTube. In Proc. Of INFOCOM.
Cisco. 2015. Global IP Traffic Forecast. http://www.cisco.com/c/en/us/solutions/service-provider/visual-networking-indexvni/ index.html.
Cramer, H. 2015. Effects of Ad Quality & Content-Relevance on Perceived Content Quality. In Proc. of CHI.
De Choudhury, M.; Sundaram, H.; John, A.; and Seligmann, D. D. 2009. What Makes Conversations Interesting?: Themes, Participants and Consequences of Conversations in Online Social Media. In Proc. of WWW.
El-Arini, K., and Tang, J. 2014. News Feed FYI: Clickbaiting. http://newsroom.fb.com/news/2014/08/news-feed-fyiclick-baiting.
Hutto, C., and Gilbert, E. 2014. Vader: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. In Proc. Of ICWSM.

Did you like this example?

Yes

Descriptive Essay on YouTube as a Video-sharing Site

Advantages of YouTube TV over Sling

Cite this paper

APA
MLA
Harvard
Vancouver

Exploring YouTube Data: A Data Driven Approach. (2022, July 08). Edubirdie. Retrieved April 10, 2025, from https://hub.edubirdie.com/examples/exploring-youtube-data-a-data-driven-approach/

“Exploring YouTube Data: A Data Driven Approach.” Edubirdie, 08 Jul. 2022, hub.edubirdie.com/examples/exploring-youtube-data-a-data-driven-approach/

Exploring YouTube Data: A Data Driven Approach. [online]. Available at: <https://hub.edubirdie.com/examples/exploring-youtube-data-a-data-driven-approach/> [Accessed 10 Apr. 2025].

Exploring YouTube Data: A Data Driven Approach [Internet]. Edubirdie. 2022 Jul 08 [cited 2025 Apr 10]. Available from: https://hub.edubirdie.com/examples/exploring-youtube-data-a-data-driven-approach/

copy

How Youtube is Adapted into Nursing

Effects of Social Media
Nurse
YouTube

As the health care issues become more complex having nurses that are prepared with the necessary...

2 Pages | 879 Words

Public Information in the Age of YouTube

Effects of Social Media
Information Technology
YouTube

Simon Glik never foresaw his arrest in public for using his cell phone camera. Yet that's exactly...

3 Pages | 1239 Words

Behavioural Treatment: Compulsive YouTube Usage

Effects of Social Media
Human Behavior
YouTube

The internet and its continually progressive nature have shaped society in ways that have...

5 Pages | 2336 Words

Streaming Places: Youtube VS Netflix

Netflix
YouTube

Increasingly popular streaming services such as YouTube and Netflix are eager to win over and...

3 Pages | 1400 Words

How YouTube Changes our Learning Method

Effects of Social Media
Technology in Education
YouTube

In this rapid information era, it is undeniable that YouTube has become a well-known video-sharing...

2 Pages | 881 Words

The Effective Use of Youtube Videos for Teaching Food Chains

Effects of Social Media
Teacher/Teaching
YouTube

Technology plays a vital role in society and can be incorporated into various aspects of our...

1 Page | 492 Words

Understanding YouTube through the Power, Pleasure & Patterns Narratives

Effects of Social Media
Power
YouTube

The three narratives that Joshua Meyrowitz illustrates each answer the question “What do media do...

6 Pages | 2611 Words

Youtube as an Alternative to TV

Watching TV
YouTube

This research focuses on the topic, YouTube: an alternative to television, which focuses on the...

1 Page | 443 Words

The Economics of YouTube

Effects of Social Media
Google
YouTube

Almost all of us are familiar with YouTube the Google owned video sharing platform, but not all of...

2 Pages | 1013 Words

Abstract

Introduction

Data Collection

References

Cite this paper

Most popular essays

Join our 150k of happy users

Exploring YouTube Data: A Data Driven Approach

Abstract

Introduction

Data Collection

References

Cite this paper

Related essay topics

Related papers

Most popular essays

Join our 150k of happy users