Expert vs. Novices Dimensions of Tagging Behaviour in an Educational Setting*

The organization and representation of information and knowledge have always been exclusively in the domain of professionals and experts. This has begun to change with the development of folksonomies as alternative, user-generated models of organizing information. The aim of this paper is to research the efficiency in tagging and folksonomy. The flexibility of tagging allows users to classify their collections of items in the ways that they find useful, but the personalized variety of terms can present challenges when searching and browsing. In order to determine the efficiency of tagging research evidence about the nature of tagging and tagging behaviour of specific user groups is needed. This paper contributes to research findings in this domain by presenting findings from a study exploring differences in expert and novices tagging. The research was conducted by giving freshman students, with no prior knowledge of tagging or indexing and therefore determined as novices, an article in the social bookmarking service Delicious. Based only on title, subtitle and abstract of the article every student was supposed to assign tags to that article and do the same after reading the whole article. The same procedure was repeated with postgraduate students from the Department of Information Sciences with sufficiently experience and knowledge in tagging and indexing. In this way differences or similarities between tagging by more advanced users and tagging by average / amateur users could be analyzed and compared. The research has surfaced differences in tag numbers and tag distributions. The findings indicate more precision and consistency in tagging of the expert group, indicating that education in tagging could raise the quality of folksonomies on the long term.


Introduction
The age of Web 3.0 is coming, and it's making information revolution just as big as Web 2.0 almost a decade ago.Web 2.0 websites allow users to do more than just access or retrieve information; rather they have become the centre of information channels.The Web 2.0 offered all users the same freedom to contribute as a platform for participation, with a focus on communities, sharing content or user-generated content, and interchange of data (Dasqupta and Dasqupta, 2009).The users have changed their role by becoming information producers, creators and co-creators.They influence the composition and design of systems and services by adding and organizing their own content.New information systems based on Web 2.0 application and services are shaped by user input and systems' responses are influenced by the search activities of former users.Entirely new types of information resources, new models of the various forms of information seeking behavior as well as new aspects of user expectations have emerged as a result of this change and development (Špiranec and Banek Zorica, 2010, p.142).Examples of Web 2.0 use include Delicious, Flickr, YouTube, blogs, Wikipedia, social tagging folksonomy and Google (O'Reilly, 2005).
Nevertheless, ever since the appearance of Web 2.0 tools and applications critics were inevitable.Due to opponents, Web 2.0 has created a cult of digital narcissism and amateurism, which undermines the notion of expertise by allowing anybody, anywhere to share and place undue value upon their own opinions about any subject and post any kind of content, regardless of their particular talents, knowledge, credentials, biases or possible hidden agendas.
Due to the indicated inflation of data and information it is crucial to bring order into information chaos and different approaches to this aim are proposed.Of course, the problem of organizing information is not new.However, throughout history, endeavours in organizing information were never brought to perfection.For this reason, different knowledge organization systems were developed, but every approach implemented came with drawbacks.Novel forms of organizing information in Web 2.0 environments, folksonomies and tagging, have added an interesting twist to the traditional debate of how to optimize access to information.But before drawing conclusions on the potential of folksonomies and tagging efficiency in organizing information, more research data on the nature of tagging and tagging behaviour is needed.This paper contributes to research insights and knowledge on efficiency of tagging and folksonomy by comparing tagging behaviour of novices and experts in an academic setting.

Setting the Scene: New Contours of Organizing Knowledge
A primary concern of the information domain is organizing information.To this end, languages for document representation and organization were used.Information professionals have developed indexing languages which could be described as set of terms used to represent topics or features of documents and the rules for combining or using those terms.However, information organisation information and knowledge representation are very complex processes and throughout history no perfect solution to organizing or indexing information was found.Approaches were usually expertoriented, meaning that either professional indexers (subject headings, descriptors, and other bibliographic data) or authors (keywords, full text) were indexing agents.
The Web 2.0 and the explosion of user-generated content did not brought a solution to practices of organizing or indexing information, but has on contrary deepen existing problems.The growth of user-generated content increased the demand for suitable methods and facilities of information storage and retrieval.Therefore, companies (and individuals) have developed collaborative information services, e.g.social bookmarking, photosharing, videosharing etc. that enable users to store and publish information, but also to index and organize it via tagging.

Tagging, Tags and Folksonomies: Basic Features
In online computer systems terminology, a tag is a non-hierarchical keyword or term assigned to a piece of information, a kind of metadata that helps describe an item and allows it to be found again by browsing or searching.Tagging is a kind of indexing, but the term "indexing" usually describes work done by professionals, whereas tagging is done by anyone interested in sharing tags (identified terms) to describe a document in a public network space.Tags are generally chosen informally and personally by the item's creator or by its viewer, depending on the system In other words, the user and/ or producer of the information becomes an indexer.Social indexing, social tagging or collaborative tagging describes the process by which many users add metadata in the form of keywords to shared content (Godler and Huberman, 2006, p.198), while the totality of all tags of any given information platform forms a folksonomy.Unlike metadata assigned by authors, or by professional indexers in libraries, each end-user's tags reflect that end-user's personal understanding of the content (Tsai, Hwang and Tang, 2011, p.272).Folksonomies are organic by nature; without predetermining or regulating the tagging process, they mirror the understandings and perceptions of users and develop and advance with usage.

Positive and Negative Aspects of Folksonomies
Advantages and disadvantages of tagging and folksonomies have been analyzed by many authors (Mathes, 2004;Quintarelli, 2005;Fichter, 2006;Munk and Mørk, 2007;Guy and Tonkin, 2006;Golder and Huberman, 2006).A common theme that emerges within discussions on positive aspects of tagging refers to comparisons of folksonomies with traditional means of knowledge organization like controlled vocabularies.Those were criticized for a number of reasons even before the advent of folksonomies.It is exactly the negative features of controlled vocabularies that are counted as advantages of social tagging.According to Peters (2009, p.227) positive aspects of tagging and folksonomies can be summarized as follow: ◊ Folksonomies authentically reflect the users language (rather than the creators or indexers) ◊ Broaden access to information resources ◊ Divide the burden of indexing among many shoulders ◊ Are the only possibility of indexing mass information on the internet ◊ Reflect the spirit of modern time (new problems, new issues, subjects, innovations, research fields ◊ Allow multicultural views (before one viewpoint was tolerable because collections had local character) ◊ User-friendly, ◊ Up-to-date etc.
However, tagging and folksonomies have also elicited significant disadvantages.When users can freely choose tags, as opposed to selecting terms from a controlled vocabulary, the resulting metadata can include homonyms and synonyms.This may lead to inappropriate connections between items and inefficient searches for information about a subject (Golder and Huberman, 2006, p.199).Often critiqued is the user's tendency to assign un-specific and broad tags (Munk and Mørk, 2007, p.116) or the subjectivity of tags that just the tagger understands (Rolla, 2009, p.178).Furthermore, tagging systems open to the public are also open to tag spam, in which people apply an excessive number of tags or unrelated tags to an item (such as a YouTube video) in order to attract viewers/readers.Furthermore, many tagging systems allow one-word tags to be indexed, which leads to a confusing variety of compounds.Common critical observations also refer to the loss of the context of indexing, different levels of indexing, high percentages of misspelling, mix-up of languages, the lack of a separation of formal (bibliographical) and aboutness tags etc.Taken together, the strengths (and vice versa weaknesses) of tagging as a new approach information organization and a potential solution to the problem of organizing knowledge in contemporary information environments and organizations can only be determined through research and studies.The present study aims to contribute to body of research findings in this domain and gain insight into a specific niche of interest in the field: the efficiency of tagging of experts and novices in academic environment.

Previous Research in Tagging Behaviour
In order to determine the prospects of tagging as a viable approach to organizing information and representing knowledge concrete empirical and research data is needed.Emerging strings of research reveal that studies referring to tagging behaviour predominantly cover aspects of tagging motivation, laws regulating the distribution of tags, determining tag categories, comparison between tags and subject headings or professional and amateur tags etc.
On a very generic level, tagging behaviour can be described as the relationship between users and tags.According to Peters (2009, p.184), tagging behaviour deals specifically with user's behaviour during tagging and provides answers for questions such as: why does the user tag, what facilities can be used to make tagging easier or give it a structure.Other studies concentrate on different research questions such as in what way users assign tags, what motivates the user, what number of tags does he add to a source and are there specific differences between diverse user groups, do users tag differently in different environments and for different purposes, will (expert) knowledge and competencies regarding information organization or the concept of tagging generate more efficient tags etc.
Many studies have dealt with the average numbers of tags allocated to a resource.Heckner, Neubauer and Wolff (2008) demonstrate that tag frequency varies from one service to the other (e.g.scientifically-oriented services like Connotea vs. general services like Delicious).According to this study, Connotea resources are indexed with 4.22 tags on average, in comparison to Delicious or Flicker with less than 3 tags per resources.Obviously the type of service and nature of resources trigger different tagging behaviour.
When exploring aspects of tag frequency, researchers have observed that the distribution of tags follows a power law distribution were a relatively small number of tags are used frequently while a high number of tags are used infrequently (Munk and Mørk, 2007, p.116).
Also, important dimensions on tagging behaviour refer to motivations of users regarding tagging.Marlow et al. (2006, p.5) have listed a range of potential motivations that influence tagging behaviour, such as personal information management (for purpose of future retrieval), contribution and sharing (to increase access to resources and allow other users to access content), attracting attention or promote specific resources, play and competition, self-presentation, opinion expression etc.The range of motivations in turn affects the types of tags that are produced for a given resource.A more general categorization on user tagging motivation was proposed by Hammond who differentiates between tagging for personal reasons or personal resource management ("self" and social tagging, i.e. tagging for other people "others").
One more significant dimension of research that reveals the nature of tagging deals with the distinction in tagging behaviour between experts/professionals and novices/ amateurs, where predominantly 2 groups of studies can be identified: 1. Studies eliciting differences in assigning keywords/subject headings by information professionals and tags assigned by users, 2. Studies identifying differences in tag assignment by knowledgeable/expert users of social tagging or bookmarking services and by users not familiar with social tagging and tag creation in general.
The first group of studies concentrates on differences in tags when assigned by professionals (i.e.information professionals) and the average user.Such studies usually compare tags with controlled vocabularies or descriptors/subject headings derived from controlled vocabularies to determine how tags differ from keywords, subject headings or descriptors assigned by experts.
In an early study, Kipp (2005) examined the context of online indexing from the viewpoint of three different groups: users, authors, and intermediaries.User, author and intermediary keywords were collected from journal articles tagged on CiteUlike and analysed.Descriptive statistics and thesaural term comparison shows that there are important differences in the context of keywords from the three groups.A more recent study conducted by the same author Kipp (2005) collected user tags, author keywords and descriptors from academic journal articles, which were both indexed in Pubmed and tagged on CiteULike.The study showed that there are important differences in the use of keywords between the three groups in addition to similarities which can be used to enhance support for search and browse.While tags and author keywords were found that matched descriptors exactly, other terms which did not match but provided important expansion to the indexing lexicon were found.
A study on term selection patterns by Heckner, Mühlbacher and Wolff (2008, p.13) reported that user-assigned tags tend to be either more general or more specific than terms assigned by authors.Other studies identified differences between experts and users by comparing terms assigned by experts within the LCSH (Library of Congress Subject Headings) scheme and tags assigned by users to the same materials (Wetterstrom, 2008;Rolla, 2009).Content analyses of tags assigned to titles in LibraryThing and subject headings assigned to the same items conducted by different authors (e.g.Adler, 2009;Pirmann, 2008) suggest similar differences.More specifically, Golder and Huberman (2006, p.203) conducted an analysis of tags in Delicious and found that in addition to content-related tags, users also assigned tags relating to the use of an item (e.g., to read), ownership of an item, and for task organization (e.g., job search).Thomas, Caudle and Schmitz (2009, p.429) analyzed tags assigned to titles in LibraryThing and found that 35% of tags represented synonyms or related concepts that are not used in LCSH, further supporting the notion that tags have significant potential to enhance subject access.Taken together, these findings suggest that experts and users (non-experts) employ vocabularies that do not overlap but reveal significant differences.
The second group of studies in tagging behaviour of experts and non-experts focuses on user's familiarity with social tagging and its influence on the choice off tags.Expertise is here defined as knowing, understanding or being familiar with social tagging processes.Users who have high familiarity with social tagging are considered as experienced (experts) tag creators, and they are likely to select tags that come from a common vocabulary shared by a community of users in a social tagging system through a process of learning and exploring of the content and interacting with other users (Golder and Huberman, 2006, p.205).In contrast, novice tag creators (i.e.low level of familiarity with social tagging), because of their unfamiliarity with tagging concepts or the community of users, may apply inappropriate terms or those that have meaning only to themselves (Marlow et al., 2006, p.37) Therefore, some authors argue that familiarity as well as expertise is a fundamental issue to be investigated in social tagging with direct relation on the effectiveness of tags.Lee, Goh, Razikin and Chua (2009) have aimed to explore the nature of the relationship between familiarity with tagging and the effectiveness of tag for content sharing.Their study showed that people with high familiarity (i.e.concept of tagging and social tagging systems) perform better in terms of creating effective tags for content sharing.Specifically, the overall finding of this study suggests that experts (i.e.high familiarity) are likely to perform better than novices (i.e.low familiarity) in terms of using more effective tags for content sharing.
Tsai, Hwang and Tang (2011) focused on the question how similar are tags generated by different taggers and how do the tags represent the main concepts of the articles?
The main purpose of their study was to examine whether experts can provide a more consistent and representative set of tags for academic and scientific documents than novices can generate.The findings showed that the tags assigned by the expert group differed from those assigned by the novice group, especially in pair wise similarity.Tags chosen by experts had more consistency, and reflected domain knowledge and competent understanding.
The presented review of studies reveals that: ◊ Experts / professionals and users tag differently regarding diverse criteria (number of tags, tag categories, specificity etc.) ◊ Familiarity with social tagging plays an important role in the creation and usage of effective tags for content sharing.
The following study will test these two assumptions in an academic/educational context.

Research Questions
The preceding discussion and literature review have shown that tagging is considered and perceived as a viable alternative to traditional systems of organizing information.However, to determine real value and sustainability of tags and folksonomies, additional insights and research data are needed.This study focuses on differences in tagging behaviour of experts and novices.The main research questions are: 1. Can previous findings on difference in tagging behaviour between experts and novices be confirmed when tagging is performed in an educational setting?
2. In what points (number of tags, tag categories) do tags assigned by experts and novices differ?
Specifically, the study will concentrate on differences in the number of tags, the added-value of tags in comparison to other keywords within the text, language preferences, differences in tag categories etc.

Participants and Research Design
The research was conducted on two sample groups in total of 112 persons.Fifty eight of them were freshmen students of information sciences i.e. future professionals, and fifty four of them were students of second year of masters' programme, also at the Department of Information Sciences.Since participants from the second year of MA programme have completed mandatory courses in indexing where they were introduced to concepts of indexing, controlled vocabularies and tagging and were obliged to use social bookmarking services (e.g.Delicious), they were considered to be experts.As opposed to them, freshmen students were considered to be novices since they had no knowledge of indexing theory and did not gained deeper insights into tagging processes.Both groups were asked to read an article and using Delicious 1 to assign tags based on the content read (abstract) and based on reading the entire article.The selected article was: "Descriptor and Folksonomy Concurrence in Educational Related Scholarly Research" by Robert Bruce 2 .

Research Limits
The generalization of results is constrained by 2 factors: the small number of participants and the controlled environment in which the study was conducted.Participants were asked to assign their tag in class.Therefore, participants assigned their tags very focused and carefully, intentionally avoiding mis-spelt, sloppy tags or tags that carry personal meaning (e.g."to-do", "me" etc.), which is a feature of spontaneous tagging as research has shown (Golder and Hubermann, 2006).

Results
The article itself had around 1800 words along with abstract that had around 150 words.After reading only the abstract, freshmen students tagged 43 different words in opposition of masters' students who tagged just 30.Total amount of tags assigned by novice users was 268 and numbers of tags assigned by experts was 229, which means that average novice user assigned 4,6 tags while expert users assigned 4,3 tags as shown in Figure 1.When it comes to most frequently used tags, the results were similar.Novice and experts users assigned tags shown in Table I.As evident from Table I, three tags ("classification", "ERIC", "vocabulary") are used in just one group of the sample.
After reading the whole article, and giving tags afterwards, difference was much more significant.More advanced users assigned 309 tags in total or 5.7 tags per person.On the other hand, novice users assigned in total 494 tags or 8.5 tags per user.82 different tags were assigned by freshmen students in opposition to experts who assigned 39 different terms (Figure 2).II.If those two research groups are combined in one single average user, the results are following: average user (experts and novices included) after reading only abstract assigned 4,4 tags.The average percentage of different tags assigned by both user groups is shown in Table III.If these findings are analyzed more in detail, from 39 different tags assigned by experts, 16 of them, or 41%, were used only once, by one user.Only ten tags were assigned by more than ten users.When it comes to novices 37 out of 82 tags (45%) were assigned once.Furthermore ten tags were assigned by two users.It is important to notice that the vast majority of those tags were words taken from the sentence without context significance, such as "lack of organization", "research", "folksonomy based website" etc.The difference in tag frequency use in the expert and novices group of user is shown in Figure 3. Sonja ŠPİRANEC and Mislav BOROVAC Tagging efficiency was also measured by comparing author-supplied keywords with expert's and novices tags.The author himself assigned following eight tags: "Collaborative tagging"; "Social tagging"; "Social classification"; "Knowledge organization"; "Taxonomies"; "Folksonomies"; "Controlled vocabulary"; "Descriptors".Novices, in general, used single word tags.Apart from tags in chart 7, they used tags "taxonomy" (18%), "social" (18%), even "collaborative" (12%), which, due to the lost semantic context, could be described as un-efficient tags.Only very small percentage combined them into a single tag such as "social classification" (6%)."Knowledge organization" was assigned by 12% of novices.
Experts, however, did somewhat better."Knowledge organization" was assigned by 18% percent of users, "social classification" 15%, "collaborative tagging" by 12%.Overall, they used significantly more compound terms.Figure 4 shows that tags chosen by experts are more similar to author-supplied keywords.It is important to stress out that not a single tag was assigned in mother tongue (Croatian), they were all assigned in English.

Discussion
Despite the fact that we have to be cautious about the results of this study due to the small number of participants, it has indicated that there are some differences in tagging behaviour of experts and novices.These differences pertain to the number of tags assigned as well as the tags that were chosen for describing the content.
When tagging is based on less information and shorter texts (in our case on title and abstract) there are actually small differences between experts and novices.The difference is confirmed when analyzing the semantics of assigned tags.Novices tend to choose more different tags in describing content.The higher dispersion of tags on numerical and semantic level inherent to indexing of novices indicates that persons more knowledgeable and experienced in indexing, tagging or organizing information are more consistent in choosing tags.
This assumption was confirmed during analysis of features of tags that participants have assigned after reading the whole text.The probability of assigning more diverse tags or tags of greater variability rises when dealing with full text because of a greater volume of words, concepts or information users have to summarize or identify as important and representative for the whole text.The number of tags assigned after reading the whole text was higher in case of experts as well as novices.However, the number of tags after reading the whole article almost doubled when they were assigned by novices.The possible explanation of this difference could be that experts have developed better competencies to describe the subject or abouteness of resources, regardless of text length.Contrary to this, novices are much more spontaneous and inconstant in their tagging decisions and obviously influenced by different factors such as text length.
One more interesting difference that occurred in tag allocation was the use of tags that describe the same content, but on different hierarchical level, e.g."controlled vocabulary" and "vocabulary".Novices showed a stronger tendency towards using same concepts on different hierarchical levels although it obviously is redundant to use both of these tags for describing the same resource and can cause information ballast or noise.That experts are less redundant in tag assignment is also confirmed by the overall lower number of tags they used.
One more important dimension that differentiates experts from novice user tagging is the distribution of tags.The overall results suggest the existence of a Power Law curve according to which certain tags isolate themselves from others based on their frequency.In this study, the most frequently used tag "folksonomy" was assigned by over 90 percent of all users; it is the most used tag in the group of experts as well as in the novices group.The term "tag" was the second most used tag in both groups.According to Peters (2009, p.323), such tags can me named "power tags" since they best describe the resource's content and reflect the implicit consensus of the user community, although they are semantically very broad.One important finding of this study is that the tag distribution differs in the expert and novices sample.82 different tags were assigned by novices contrary to experts who assigned 39 different terms, meaning that the variability of tags and the distribution of tags were much higher in the group of novices and that there is a good bit of disagreement about which tags are useful To put it differently, the group of experts developed consensus on resource semantics and consistency in tag assignment, which implies less information noise and ballast as well as suitability for retrieval and resource discovery.
The study has elicited an interesting result pertaining to the whole sample, experts as well as novices.All participants assigned tags in English and not one tag was expressed in the participant's mother tongue, Croatian.Previous studies (e.g.Guy and Tonkin, 2006;Vuokari and Ochoa, 2009) found that despite the bulk of tags was in English, tags from other languages were present as well or even use multiple languages to tag one and the same resource.In our study the article participants were asked to tag was written in English and students weren't instructed what language to use.Nevertheless, the fact that out of 803 tags in total not one tag was assigned in Croatian was surprising.One explanation is that users just mechanically derived words from the text, without providing any added value in form of translation/interpretation.However, in this study they were given an assignment and it is possible that participants would tag differently if they would do it for personal purposes.

Conclusion
Throughout history information organization and knowledge representation were never brought to perfection, despite tremendous development of information and communication technologies.To this end, different knowledge organization systems were developed, but every approach implemented came with drawbacks.Novel forms of organizing information in Web 2.0 environments, tagging and folksonomies, have added an interesting twist to the traditional debate of how to optimize access to information.This paper contributes to research findings on tagging and folksonomies which are necessary in order to conversantly and critically evaluate potentials (and disadvantages) of these forms of organizing information.Specifically, it focuses on differences in tagging behaviour between experts and novices in tagging.
The conducted research has shown that the professionals placed a smaller number of tags where those were required, compared to those users who are yet to become professionals.However, regardless the smaller number of tags assigned, those tags were, generally, of higher relevance to the subject, especially after reading the article.
Overall, the group of experts seemed to be more consistent in their tagging.The tag distribution showed that experts assigned less tags with a lower number of tags that were assigned just once.This means that there was less disagreement in the expert group about describing the resource.Novices more often assigned single-word tags, causing loss of context and meaning of the tag and thereby generating information ballast or tag noise.
On the other hand, a detailed analysis of numbers shows that, when pertaining to tagging, there is a thin line between professionals and frequent users.Differences in frequency, tag categories or tag features between novices and experts are not as high as expected.The reason for this may be the small number of participants, the controlled environment of the study or the type of resource that participants were asked to tag (academic paper).Still, the reason could also be the fact that web 2.0 has become new web standard.Users are allowed to create their own, personalized environment based on their own preferences, which means that good organization skills are required by default and intuitively developed.To make valid conclusions about this issue, further research on tagging differences between experts and novices/amateurs on other types of resources (e.g.videos) is needed.
Tagging is by definition user-generated and user-oriented.However, our research findings indicate that more knowledgeable users (in our case the "expert group") accomplished more efficient results in tagging: their tags were more focused and consistent; their generated tag base was less redundant and more precise in semantics.Although the findings are open to interpretation because of the small sample, they indicate that tagging experience and knowledge on tagging (developed both in formal and informal settings) could lead to higher efficiency in tagging.

Figure 1 :
Figure 1: Average Number Different Tags Assigned by Novices and Experts after Reading the Abstract

Figure 2 :
Figure 2: Average Number of Tags Assigned by Novices and Experts after Reading the whole Article Terms assigned by novices and experts after reading the whole article are shown in TableII.

Figure 3 :
Figure 3: Tag Frequency Curve Derived from Tags Provided by Novices and Experts

Figure 4 :
Figure 4: Authors' Tags Compared to Tags Assigned by Experts and Novices

Table I :
Assigned Tags by Novice and Experts Users after Reading the Abstract

Table II :
Tags Assigned after Reading the whole Article

Table III :
Average Percentage of Different Tags Assigned by Novices and Experts