Creating an Autotagger with Yahoo’s Term Extraction Service and YUI

So lets talk about tags…If you are an editor of a blog, photos, or even bookmarks these days you know all about tagging.  Incase you don’t what they are you should read up on them because you are missing a great thing on the internet (http://en.wikipedia.org/wiki/Tag_(metadata)).

It got me to thinking (scary I know) that perhaps it is a pain in the ass as a content developer to figure out what your tags are.  I mean common how do I know what people are searching for?  Isn’t there a better way?  Well perhaps there is.  Perhaps we can tie into some kind of a powerful open search api like what Yahoo has to tell us what the popular terms are in my article and to build me some tags off of it.

Hmm that seems to easy right?  Well guess what it is! Just look for yourself:

Test out the autotagger for yourself

So how the heck do you do that? Well simple just follow along:

The first thing you need to do is go out to http://developer.yahoo.com and register an appid this will allow you 5000 searches a day on their open ids’ which is more then enough for personal use.  You then need to look over Yahoo’s Term Extraction service at: http://developer.yahoo.com/search/content/V2/termExtraction.html to see what is required.

Also for fun lets put in a rich text editor because well lets be user friendly and realistic about an environment.  You can view what is required for that at: http://developer.yahoo.com/yui/editor/

So lets write some code for our Front End:

  1. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
  2. <html>
  3. <head>
  4. <title>RTE Autotagger</title>
  5. <!-- Skin CSS file -->
  6. <link rel="stylesheet" type="text/css" href="http://yui.yahooapis.com/2.6.0/build/assets/skins/sam/skin.css">
  7. </head>
  8. <body class="yui-skin-sam">
  9. <h2>RTE Autotagger</h2>
  10. <form id="rtepost" method="post">
  11. <br>
  12. <textarea id="editor" name="testrte"></textarea>
  13. <br>
  14. Tags:
  15. <input type="text" id="mytags" size="50">
  16. <input type="button" value="Guess My Tags" id="mytagbtn">
  17. <img src="http://l.yimg.com/jn/images/20081008192436/ajax-loader2.gif" height="20" width="20" id="loading" style="display:none;">
  18. <!--<input type="submit" />-->
  19. </form>
  20. <!-- Utility Dependencies -->
  21. <script type="text/javascript" src="http://yui.yahooapis.com/2.6.0/build/yahoo-dom-event/yahoo-dom-event.js"></script>
  22. <script type="text/javascript" src="http://yui.yahooapis.com/2.6.0/build/element/element-beta-min.js"></script>
  23. <!-- Needed for Menus, Buttons and Overlays used in the Toolbar -->
  24. <script src="http://yui.yahooapis.com/2.6.0/build/container/container_core-min.js"></script>
  25. <script src="http://yui.yahooapis.com/2.6.0/build/menu/menu-min.js"></script>
  26. <script src="http://yui.yahooapis.com/2.6.0/build/button/button-min.js"></script>
  27. <!-- Source file for Rich Text Editor-->
  28. <script src="http://yui.yahooapis.com/2.6.0/build/editor/editor-min.js"></script>
  29. <!-- scouce for connection manager -->
  30. <script src="http://yui.yahooapis.com/2.6.0/build/connection/connection-min.js"></script>
  31. <script>
  32. (function() {
  33. var Dom = YAHOO.util.Dom,
  34. Event = YAHOO.util.Event,
  35. Lang = YAHOO.lang,
  36. Connect = YAHOO.util.Connect;
  37. var myEditor = new YAHOO.widget.Editor('editor', {
  38. height: '300px',
  39. width: '522px',
  40. dompath: true, //Turns on the bar at the bottom
  41. animate: true //Animates the opening, closing and moving of Editor windows
  42. });
  43. myEditor.render();
  44. var generateTags = function(o){
  45. var mydata = eval('(' + o.responseText + ')');
  46. //drop in tags from json object
  47. Dom.get('mytags').value=mydata;
  48. //repace all spaces, this is where you do could other filtering or you could drop in a dash between tags
  49. Dom.get('mytags').value=Dom.get('mytags').value.replace(/ /g,'');
  50. //turn off progress spinner
  51. Dom.get('loading').setAttribute('style','display:none;');
  52. }
  53. //add event to tags button
  54. Event.on('mytagbtn','click', function(){
  55. //turn on progress spinner
  56. Dom.get('loading').setAttribute('style','');
  57. //grab content of the rte window and strip out html
  58. myContent='myContent='+myEditor._getDoc().body.innerHTML.replace(/(<([^>]+)>)/ig,"");
  59. //make ajax call to our simple proxy
  60. myTagsCnt=Connect.asyncRequest('POST', 'make_tags_api.php', {
  61. success: generateTags,
  62. failure: function() {},
  63. scope: this
  64. }, myContent);
  65. });
  66. })();
  67. </script>
  68. </body>
  69. </html>

Now lets write our simple proxy api:

  1. <?php
  2. //might want to add some security here to make sure only you are hitting your api ;)
  3. //set your type as doctype as json
  4. header("Content-Type:application/json");
  5. //create curl function to do a simple proxy for yahoo search
  6. function getContextResource($url){
  7. $ch = curl_init();
  8. curl_setopt($ch, CURLOPT_URL, $url);
  9. curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  10. $result = curl_exec($ch);
  11. curl_close($ch);
  12. return $result;
  13. }
  14. //pull your posted content from Post
  15. $myContent=$_POST['myContent'];
  16. //create url to curl, add in your appid, output, content to check and then urlencode and utf8encode your content
  17. $contextUrl = 'http://search.yahooapis.com/ContentAnalysisService/V1/termExtraction?appid=yourapid&output=json&context='.urlencode(utf8_encode($myContent));
  18. //create curl call from our url
  19. $feed = getContextResource($contextUrl);
  20. //convert to array so we can cut out the things we don't need
  21. $convertToArray = json_decode($feed, true);
  22. //only report back the values we need
  23. $cleanedArray = $convertToArray['ResultSet']['Result'];
  24. //return back a json encoded copy of the array we just cleaned up
  25. echo json_encode($cleanedArray);
  26. ?>

Wow after all of that what do we get?  Well if I copy a well written article from Yahoo Finance such as:
http://biz.yahoo.com/ap/081022/financial_meltdown.html

I get the following tags back:
treasurysecretaryhenrypaulson,apeconomics,henrypaulson,martincrutsinger, aggressivesteps,bushadministration,ysm,financialcrisis,recession,infinity, advertisement,economy,yahoo

Hmm some those are really not bad at all for just a simple search api huh.

3 Responses to “Creating an Autotagger with Yahoo’s Term Extraction Service and YUI”

  1. Andres Says:

    Very cool! Can really use this right now for a CMS tool I’m building.

  2. David Says:

    Nice! Yeah, tagging is a bit of a pain. Love the example you posted - that sells it. Now if there was only a way to embed the example in your post :). I wonder if there’s a way to hack Wordpress to display iframes?

  3. In the Wild for October 30, 2008 » Yahoo! User Interface Blog Says:

    […] the use of one of YUI’s most popular developer tools, Julien Lecomte’s YUI Compressor.Creating an Autotagger with Yahoo’s Term Extraction Service and YUI: James Long wired the Yahoo Term Extractor API to a YUI Rich Text Editor to create an autotagging […]

Leave a Reply