Creating an Autotagger with Yahoo’s Term Extraction Service and YUI
Wednesday, October 22nd, 2008So lets talk about tags…If you are an editor of a blog, photos, or even bookmarks these days you know all about tagging. Incase you don’t what they are you should read up on them because you are missing a great thing on the internet (http://en.wikipedia.org/wiki/Tag_(metadata)).
It got me to thinking (scary I know) that perhaps it is a pain in the ass as a content developer to figure out what your tags are. I mean common how do I know what people are searching for? Isn’t there a better way? Well perhaps there is. Perhaps we can tie into some kind of a powerful open search api like what Yahoo has to tell us what the popular terms are in my article and to build me some tags off of it.
Hmm that seems to easy right? Well guess what it is! Just look for yourself:

Test out the autotagger for yourself
So how the heck do you do that? Well simple just follow along:
The first thing you need to do is go out to http://developer.yahoo.com and register an appid this will allow you 5000 searches a day on their open ids’ which is more then enough for personal use. You then need to look over Yahoo’s Term Extraction service at: http://developer.yahoo.com/search/content/V2/termExtraction.html to see what is required.
Also for fun lets put in a rich text editor because well lets be user friendly and realistic about an environment. You can view what is required for that at: http://developer.yahoo.com/yui/editor/
So lets write some code for our Front End:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"><html><head><title>RTE Autotagger</title><!-- Skin CSS file --><link rel="stylesheet" type="text/css" href="http://yui.yahooapis.com/2.6.0/build/assets/skins/sam/skin.css"></head><body class="yui-skin-sam"><h2>RTE Autotagger</h2><form id="rtepost" method="post"><br><textarea id="editor" name="testrte"></textarea><br>Tags:<input type="text" id="mytags" size="50"><input type="button" value="Guess My Tags" id="mytagbtn"><img src="http://l.yimg.com/jn/images/20081008192436/ajax-loader2.gif" height="20" width="20" id="loading" style="display:none;"><!--<input type="submit" />--></form><!-- Utility Dependencies --><script type="text/javascript" src="http://yui.yahooapis.com/2.6.0/build/yahoo-dom-event/yahoo-dom-event.js"></script><script type="text/javascript" src="http://yui.yahooapis.com/2.6.0/build/element/element-beta-min.js"></script><!-- Needed for Menus, Buttons and Overlays used in the Toolbar --><script src="http://yui.yahooapis.com/2.6.0/build/container/container_core-min.js"></script><script src="http://yui.yahooapis.com/2.6.0/build/menu/menu-min.js"></script><script src="http://yui.yahooapis.com/2.6.0/build/button/button-min.js"></script><!-- Source file for Rich Text Editor--><script src="http://yui.yahooapis.com/2.6.0/build/editor/editor-min.js"></script><!-- scouce for connection manager --><script src="http://yui.yahooapis.com/2.6.0/build/connection/connection-min.js"></script><script>(function() {var Dom = YAHOO.util.Dom,Event = YAHOO.util.Event,Lang = YAHOO.lang,Connect = YAHOO.util.Connect;var myEditor = new YAHOO.widget.Editor('editor', {height: '300px',width: '522px',dompath: true, //Turns on the bar at the bottomanimate: true //Animates the opening, closing and moving of Editor windows});myEditor.render();var generateTags = function(o){var mydata = eval('(' + o.responseText + ')');//drop in tags from json objectDom.get('mytags').value=mydata;//repace all spaces, this is where you do could other filtering or you could drop in a dash between tagsDom.get('mytags').value=Dom.get('mytags').value.replace(/ /g,'');//turn off progress spinnerDom.get('loading').setAttribute('style','display:none;');}//add event to tags buttonEvent.on('mytagbtn','click', function(){//turn on progress spinnerDom.get('loading').setAttribute('style','');//grab content of the rte window and strip out htmlmyContent='myContent='+myEditor._getDoc().body.innerHTML.replace(/(<([^>]+)>)/ig,"");//make ajax call to our simple proxymyTagsCnt=Connect.asyncRequest('POST', 'make_tags_api.php', {success: generateTags,failure: function() {},scope: this}, myContent);});})();</script></body></html>
Now lets write our simple proxy api:
<?php//might want to add some security here to make sure only you are hitting your api ;)//set your type as doctype as jsonheader("Content-Type:application/json");//create curl function to do a simple proxy for yahoo searchfunction getContextResource($url){$ch = curl_init();curl_setopt($ch, CURLOPT_URL, $url);curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);$result = curl_exec($ch);curl_close($ch);return $result;}//pull your posted content from Post$myContent=$_POST['myContent'];//create url to curl, add in your appid, output, content to check and then urlencode and utf8encode your content$contextUrl = 'http://search.yahooapis.com/ContentAnalysisService/V1/termExtraction?appid=yourapid&output=json&context='.urlencode(utf8_encode($myContent));//create curl call from our url$feed = getContextResource($contextUrl);//convert to array so we can cut out the things we don't need$convertToArray = json_decode($feed, true);//only report back the values we need$cleanedArray = $convertToArray['ResultSet']['Result'];//return back a json encoded copy of the array we just cleaned upecho json_encode($cleanedArray);?>
Wow after all of that what do we get? Well if I copy a well written article from Yahoo Finance such as:
http://biz.yahoo.com/ap/081022/financial_meltdown.html
I get the following tags back:
treasurysecretaryhenrypaulson,apeconomics,henrypaulson,martincrutsinger, aggressivesteps,bushadministration,ysm,financialcrisis,recession,infinity, advertisement,economy,yahoo
Hmm some those are really not bad at all for just a simple search api huh.




