TOC TOC! - Who is There? - XOXO

by Klaus Graefensteiner 11. March 2008 00:45

Introduction

TOC is the abbreviation of Table Of Contents and this post introduces a BlogEngine 1.3 extension that automatically generates a Table Of Contents based on h1-h6 heading tags that are found in the body of posts and pages. The TOC is going to be rendered as either a <ul> or <ol> html list of anchors (links) to the corresponding heading tags. If the post or page is being saved for the first time then the TOC will be placed where the tag [ t o c a u t o g e n ] is located. Updating the post will look for <div id="tocautogen"></div> html construct and it replace it with the updated version of the TOC. The html that renders the TOC uses the xoxo class id pattern that specifies the Microformat XOXO Outline.

 

Watch a test drive of the TOCAutogen Extension

This screen cast quickly demonstrates how to use the extension:

[youtube:R9jZHVq1ksA]

Note: This screen cast doesn't include audio!

Installing the Extension

Download the extension files

Click on the following link to download the zip archive that includes the installation files: tocautogen.zip

Copy extension file

Extract the content of the tocautogen.zip archive and copy the AutoGenTableOfContents.cs file into the App_Code\Extension folder under the BlogEngine application root folder.

Modify CSS

Modify your themes CSS file to take advantage of custom formatting. As a starting point you could add the content of the XOXO.css file into your themes CSS file. The XOXO.css file is included in the tocautogen.zip file.

   1: /*TOCAutoGen support*/
   2: #tocautogen 
   3: {
   4:     background-color: #efefff;
   5:     padding: 1px;
   6:     margin: 10px;
   7:     border-width:1px;
   8:     border-style:solid;
   9:     border-color: silver;
  10: }
  11: /*Microformat support*/
  12: /*XOXO Outline Microformat*/
  13: ol.xoxo {
  14:     list-style-type:decimal;
  15:     color: #444444;
  16:     font-size: 14px;
  17:     font-weight:bold;
  18: }
  19: ol.xoxo ol {
  20:     list-style-type:lower-latin;
  21:     font-size:smaller;
  22: }
  23: ol[compact="compact"] {
  24:     display:none; 
  25: }
  26: ol.xoxo a:hover {
  27:     color: #444444;
  28:     font-size: 14px;
  29: }
  30: ol.xoxo a:visited {
  31:     color: #444444;
  32: }
  33: ul.xoxo {
  34:     color: #444444;
  35:     font-size: 14px;
  36:     font-weight:bold;
  37: }
  38: ul.xoxo ul {
  39:     font-size:smaller;
  40: }
  41: ul.xoxo a {
  42:     color: #5C80B1;
  43: }
  44: ul.xoxo a:hover {
  45:     color: #444444;
  46:     font-size: 14px;
  47: }
  48: ul.xoxo a:visited {
  49:     color: #444444;
  50: }
  51: ul[compact="compact"] {
  52:     display:none; 
  53: }

 

Development notes

During the development of the extension I came a across some interesting observation that I would like to share.

Microformat

I like the Microformat way of adding meta meaning to regular html pages. Go to Microformats if you want to learn more.

Regular Expressions

Use the Singleline regex option if your match spans multiple lines. This seems counterintuitive, but the Singleline regex option is needed to parse multiple lines as one string. This regex feature is also know as "Treat newline as white space".

CSS

The ul.xoxo ul CSS identifier lets you specify styles that get applied recursively in a html list. Using the font-size:smaller style as attribute for this identifier automatically applies smaller fonts the deeper list elements are nested. A nice example of the power of CSS.

Source Code

Here is the source code listing of the TOC Autogen extension.

   1: using System;
   2: using BlogEngine.Core;
   3: using BlogEngine.Core.Web.Controls;
   4: using BlogEngine.Core.Web.HttpHandlers;
   5: using System.Text.RegularExpressions;
   6: using System.Collections.Generic;
   7: using System.Web;
   8: using System.Text;
   9: using System.IO;
  10:  
  11: /// <summary>
  12: /// This class is a BlogEngine Extension that generates a TOC (Table Of Contents) automatically based on the occurances
  13: /// of h1 - h6 tags found in the body of posts and pages
  14: /// </summary>
  15:  
  16: [Extension("Automatically generates a Table Of Contents for Posts and Pages with anchors to html headings h1-h6", "1.3", "<a href=\"http://www.tellingmachine.com\">Klaus Graefensteiner</a>")]
  17: public class AutoGenTableOfContents
  18: {
  19:  
  20:     public AutoGenTableOfContents()
  21:     {
  22:         Post.Saving += new EventHandler<SavedEventArgs>(Post_Saving);
  23:         Page.Saving += new EventHandler<SavedEventArgs>(Page_Saving);
  24:     }
  25:  
  26:     void Post_Saving(object sender, SavedEventArgs e)
  27:     {   
  28:         if ((e.Action == SaveAction.Insert || e.Action == SaveAction.Update) && BEX.TOC.IsDesired(((Post)sender).Content))
  29:         {
  30:             Post post = sender as Post;
  31:             post.Content = BEX.TOC.UpdateHTML(post.Content);
  32:         }
  33:     }
  34:     void Page_Saving(object sender, SavedEventArgs e)
  35:     {
  36:         if ((e.Action == SaveAction.Insert || e.Action == SaveAction.Update) && BEX.TOC.IsDesired(((Page)sender).Content))
  37:         {
  38:             Page page = sender as Page;
  39:             page.Content = BEX.TOC.UpdateHTML(page.Content);
  40:         }
  41:     }
  42:  
  43: }
  44:  
  45:  
  46:  
  47: namespace BEX
  48: {
  49:     /// <summary>
  50:     /// This class represents the Table Of Contents data model
  51:     /// </summary>
  52:     public class TOC
  53:     {
  54:         public TOC()
  55:         {
  56:  
  57:         }
  58:  
  59:         private const string Token = "tocautogen";
  60:  
  61:         public const string AnchorPrefix = "toc_";
  62:  
  63:         public const string xl = "ul"; //HTML list type (ul, ol, dl)
  64:  
  65:  
  66:         /// <summary>
  67:         /// The regular expression used to find h1-6 heading tags.
  68:         /// </summary>
  69:         private static readonly Regex AnalyzeHeadingsRegex = new Regex(@"<\s*h(?'LEVEL'[1-6])(?'RAWATTRIBUTES'.*?)>(?'RAWHEADING'.*?)<\s*/\s*h\k'LEVEL'\s*>", RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.Singleline);
  70:  
  71:         private static List<Heading> headings = new List<Heading>();
  72:  
  73:         /// <summary>
  74:         /// The regular expression used to heading reference id tags before the headings e.g. id=toc_1.3.23.2
  75:         /// </summary>
  76:         private static readonly Regex ReplaceAnchorRegex = new Regex("<\\s*span\\s+id=\"" + AnchorPrefix + "\\d+\"\\s*><\\s*/\\s*span\\s*>", RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.Singleline);
  77:  
  78:         /// <summary>
  79:         /// The regular expression used detect the rendered table of contents
  80:         /// </summary>
  81:         private static readonly Regex FindOLRegex = new Regex("<\\s*div\\s+.*?id=\"" + Token + "\"", RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.Singleline);
  82:  
  83:         /// <summary>
  84:         /// The regular expression used to completely remove or replace the table of contents
  85:         /// </summary>
  86:         private static readonly Regex ReplaceOLRegex = new Regex("<\\s*div\\s+.*?id=\"" + Token + "\".*?>.*?<\\s*/\\s*div\\s*>", RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.Singleline);
  87:  
  88:         public static bool IsDesired(string htmltext)
  89:         {
  90:             return (true == htmltext.Contains("[" + Token + "]") || true == FindOLRegex.IsMatch(htmltext));
  91:         }
  92:         
  93:         public static string UpdateHTML(string htmltext)
  94:         {
  95:             Heading.Init();
  96:             headings.Clear();
  97:  
  98:             int Level = 1;
  99:             string HeadingText = "";
 100:             string AttributeText = "";
 101:             string MatchText = "";
 102:             string PreceedingText = "";
 103:             int FormerMatchStartPosition = 0;
 104:  
 105:             //Remove existing TOC and replace it with our token
 106:             htmltext = ReplaceOLRegex.Replace(htmltext, "[" + Token + "]");
 107:  
 108:             //Clean out existing anchor span tag
 109:             htmltext = ReplaceAnchorRegex.Replace(htmltext, String.Empty);
 110:             
 111:             MatchCollection HeadingMatches = AnalyzeHeadingsRegex.Matches(htmltext);
 112:             string TextAfterLastMatch = string.Empty;
 113:             
 114:             //Analyze Headings
 115:             for( int i = 0; i < HeadingMatches.Count; i++)
 116:             {
 117:                 Level = Convert.ToInt32(HeadingMatches[i].Groups["LEVEL"].Value);
 118:                 HeadingText = HeadingMatches[i].Groups["RAWHEADING"].Value;
 119:                 AttributeText = HeadingMatches[i].Groups["RAWATTRIBUTES"].Value;
 120:                 MatchText = HeadingMatches[i].Groups[0].Value;
 121:                 PreceedingText = htmltext.Substring(FormerMatchStartPosition, HeadingMatches[i].Index - FormerMatchStartPosition);
 122:                 FormerMatchStartPosition = HeadingMatches[i].Index + HeadingMatches[i].Length;
 123:                 headings.Add(new Heading(Level, HeadingText, AttributeText, MatchText, PreceedingText));
 124:  
 125:                 if (i == HeadingMatches.Count - 1)
 126:                 {
 127:                     TextAfterLastMatch = htmltext.Substring(HeadingMatches[i].Index + HeadingMatches[i].Length);
 128:                 }
 129:             }
 130:  
 131:             //Build Table of contents
 132:             string TOCHtmlText = String.Empty;
 133:             TOCHtmlText = GenerateTableOfContents();
 134:  
 135:             //Build new html file
 136:             StringBuilder sb = new StringBuilder();
 137:             foreach (Heading h in headings)
 138:             {
 139:                 sb.Append(h.ToString());
 140:             }
 141:             sb.Append(TextAfterLastMatch);
 142:             htmltext = sb.ToString();
 143:  
 144:             //Inject table of contents
 145:             return htmltext.Replace("[" + Token + "]", TOCHtmlText);
 146:         }
 147:         
 148:         private static string GenerateTableOfContents()
 149:         {
 150:             int PreviousLevel = Heading.LowestNumber;
 151:             for (int i = 0; i < headings.Count; i++)
 152:             {
 153:                 headings[i].LevelDeltaToPreviousInList = headings[i].Level - PreviousLevel;
 154:                 PreviousLevel = headings[i].Level;
 155:             }
 156:  
 157:             StringBuilder sb = new StringBuilder();
 158:             //Add xoxo class to ul to make it an Outline microformat: http://microformats.org/wiki/xoxo
 159:             sb.Append("<div id=\"" + Token + "\"><" + xl + " class=\"xoxo\">");
 160:  
 161:             int stackcheck = 0;
 162:  
 163:             foreach (Heading h in headings)
 164:             {
 165:                 if (h.LevelDeltaToPreviousInList > 0)
 166:                 {
 167:                     for (int i = 0; i < h.LevelDeltaToPreviousInList; i++)
 168:                     {
 169:                         sb.Append("<li>[?]<" + xl + ">");
 170:                         stackcheck++;
 171:                     }
 172:                 }
 173:                 
 174:                 if (h.LevelDeltaToPreviousInList < 0)
 175:                 {
 176:                     for (int i = 0; i < (- h.LevelDeltaToPreviousInList); i++)
 177:                     {
 178:                         sb.Append("</" + xl + "></li>");
 179:                         stackcheck--;
 180:                     }
 181:                 }
 182:  
 183:                 sb.Append("<li><a href=\"#" + h.AnchorString + "\">" + h.HeadingString + "</a></li>");
 184:                 
 185:                 
 186:             }
 187:             for (int i = 0; i < Heading.LevelOfLastHeading - Heading.LowestNumber; i++)
 188:             {
 189:                 sb.Append("</" + xl + "></li>");
 190:                 stackcheck--;
 191:             }
 192:  
 193:             sb.Append("</" + xl + "></div>");
 194:             
 195:             //sb.Append(stackcheck.ToString()); //Uncomment the preceding statement to debug the building of the nested lists
 196:             
 197:             //Remove extra [?} list items and move nested list items directly under the parent item instead of
 198:             //under a [?] tag
 199:             return sb.ToString().Replace("</li><li>[?]", string.Empty);
 200:         }
 201:  
 202:     }
 203:  
 204:     // <summary>
 205:     /// This class encapsulates all information that resulted in parsing the heading tags out of an html file
 206:     /// </summary>
 207:     public class Heading
 208:     {
 209:         public Heading()
 210:         {
 211:  
 212:         }
 213:         public Heading(int level, string heading, string attributes, string match, string preceeding)
 214:         {
 215:             _Identity = _IdentityCounter++;
 216:             _Level = level;
 217:             _HeadingString = heading;
 218:             _AttributeString = attributes;
 219:             _MatchString = match;
 220:             _PreceedingString = preceeding;
 221:             _AnchorString = TOC.AnchorPrefix + Identity.ToString();
 222:             
 223:             if (level < LowestNumber)
 224:             {
 225:                 LowestNumber = level;
 226:             }
 227:  
 228:             if (level > HighestNumber)
 229:             {
 230:                 HighestNumber = level;
 231:             }
 232:  
 233:             LevelOfLastHeading = level;
 234:         }
 235:  
 236:         public static void Init()
 237:         {
 238:             _IdentityCounter = 0;
 239:             _LowestNumber = 6;
 240:             _HighestNumber = 1;
 241:         }
 242:  
 243:         private int _Level;
 244:         private int _LevelDeltaToPreviousInList;
 245:         private string _HeadingString;
 246:         private string _AttributeString;
 247:         private string _MatchString;
 248:         private string _ID;
 249:         private string _PreceedingString;
 250:         private static int _IdentityCounter = 0;
 251:         private int _Identity;
 252:         private static int _LowestNumber = 6;
 253:         private static int _HighestNumber = 1;
 254:         private static int _LevelOfLastHeading = 6;
 255:         private string _AnchorString;
 256:  
 257:  
 258:         public override string ToString()
 259:         {
 260:             return this.PreceedingString + "<span id=\"" + this.AnchorString + "\"></span>" + this.MatchString;
 261:         }
 262:  
 263:         public int Level
 264:         {
 265:             get { return _Level; }
 266:         }
 267:  
 268:  
 269:         public string MatchString
 270:         {
 271:             get { return _MatchString; }
 272:         }
 273:  
 274:  
 275:         public string AttributeString
 276:         {
 277:             get { return _AttributeString; }
 278:             set { _AttributeString = value; }
 279:         }
 280:  
 281:         public string AnchorString
 282:         {
 283:             get { return _AnchorString; }
 284:             set { _AnchorString = value; }
 285:         }
 286:  
 287:  
 288:         public string HeadingString
 289:         {
 290:             get { return _HeadingString; }
 291:             set { _HeadingString = value; }
 292:         }
 293:  
 294:  
 295:         public string ID
 296:         {
 297:             get { return _ID; }
 298:             set { _ID = value; }
 299:         }
 300:  
 301:  
 302:         public string PreceedingString
 303:         {
 304:             get { return _PreceedingString; }
 305:         }
 306:  
 307:         public int Identity
 308:         {
 309:             get { return _Identity; }
 310:         }
 311:  
 312:         public static int LowestNumber
 313:         {
 314:             get { return _LowestNumber; }
 315:             set { _LowestNumber = value; }
 316:         }
 317:  
 318:         public static int HighestNumber
 319:         {
 320:             get { return _HighestNumber; }
 321:             set { _HighestNumber = value; }
 322:         }
 323:  
 324:         public static int LevelOfLastHeading
 325:         {
 326:             get { return _LevelOfLastHeading; }
 327:             set { _LevelOfLastHeading = value; }
 328:         }
 329:  
 330:         public int LevelDeltaToPreviousInList
 331:         {
 332:             get { return _LevelDeltaToPreviousInList; }
 333:             set { _LevelDeltaToPreviousInList = value; }
 334:         }
 335:     
 336:     }
 337:  
 338: }

 

Known Issues

Opening the re-rendered post in TinyMCE modifies the html and strips out attributes from span tags that mark the TOC link targets. Before they look like <span id="toc_5"></span>. After TinyMCE loaded the post body, the tags look like <span></span>. As consequence the target span elements will not be removed  after saving the post in TinyMCE. The extension will still work fine, but the empty span tags stay in the body.

I need to figure out why TinyMCE modifies the body during the loading.

Enhancements

Support the configuration of the html list type and possible some CSS styles by the BlogEngine extension configuration framework.

Add "back to TOC" links to the headings. This way it is easy to jump back to the table of contents.

Summary

If your blog posts or pages are longer than just a few lines and span several screens, then having an outline with links to the actual paragraphs makes it easier for readers to focus on the parts that they might be interested in. They will look at the outline and jump to the section that matters to them. The extension that my blog post introduces creates the outline and the TOC automatically.

What a great way to save some time.

At least in my opinion.

kick it on DotNetKicks.com

Tags: , , ,

BlogEngine.NET | C#

Comments

3/11/2008 3:43:08 AM #

Juio Tentor

It seems to be a nice extension, I will try it.

Juio Tentor Argentina |

Comments are closed

About Klaus Graefensteiner

I like the programming of machines.

Add to Google Reader or Homepage

LinkedIn FacebookTwitter View Klaus Graefensteiner's profile on Technorati
Klaus Graefensteiner

Klaus Graefensteiner
works as Developer In Test and is founder of the PowerShell Unit Testing Framework PSUnit. More...

Open Source Projects

PSUnit is a Unit Testing framwork for PowerShell. It is designed for simplicity and hosted by Codeplex.
BlogShell is The tool for lazy developers who like to automate the composition of blog content during the writing of a blog post. It is hosted by CodePlex.

Administration

About

Powered by:
BlogEngine.Net
Version: 1.6.1.0

License:
Creative Commons License

Copyright:
© Copyright 2014, Klaus Graefensteiner.

Disclaimer:
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

Theme design:
This blog theme was designed and is copyrighted 2014 by Klaus Graefensteiner

Rendertime:
Page rendered at 12/20/2014 7:59:03 PM (PST Pacific Standard Time UTC DST -7)