Update: Something went wrong with the code snippets; should be shown correctly now!
Now we have MGrammar mode correctly running in Intellipad, let’s try out some stuff.
Let’s create a language that understands textual representation of the title, location, URL and email address of an RSS Feed. I also want the language to skip whitespace and comments, and the email address should be validated.
So as sample instance data I wrote this:
1: Title: inwit.nl
2: Url: http://inwit.nl
3: RssFeedUrl: http://feeds.feedburner.com/inwitnl
4: Email: rj@vanholland.net
5:
6: //this is comment
7: /*
8: this is also comment
9:
10: */
11:
12: Title: IntellipadBlog
13: Url: http://blogs.msdn.com/intellipad
14: RssFeedUrl: http://blogs.msdn.com/intellipad/rss.xml
15: Email: oslo@microsoft.com
As you can see, just two instances of a Feed type with some comments and whitespacing in there.
Now, let’s write a language that swallows this data. What I in fact did was create three languages:
- A common language with some stuff you’d want to use more often; perhaps the Email Language should be moved here also.
- The language that understands an Email Address
- The actual RSS language
The Common Language
This language should cover the part of understanding white spacing and comments; after having looked at the demo done at the PDC and after having looked around in the “C:\Program Files\Microsoft Oslo SDK 1.0\Samples\MGrammar\Languages” directory of your SDK installation I came up with this:
1: language InwitCommon
2: {
3:
4: token Skippable = Whitespace | Comment;
5:
6: token Comment = CommentToken;
7: token CommentToken
8: = CommentDelimited
9: | CommentLine;
10: token CommentDelimited = "/*" CommentDelimitedContent* "*/";
11: token CommentDelimitedContent =
12: ^('*')
13: | '*' ^('/');
14: token CommentLine = "//" CommentLineContent*;
15: token CommentLineContent = ^(
16: '\u000A' // New Line
17: | '\u000D' // Carriage Return
18: | '\u0085' // Next Line
19: | '\u2028' // Line Separator
20: | '\u2029'); // Paragraph Separator
21:
22:
23:
24: token Whitespace = WhitespaceToken+;
25: token WhitespaceToken = WhitespaceCharacter+;
26: token WhitespaceCharacter
27: = '\u0009' // Horizontal Tab
28: | '\u000B' // Vertical Tab
29: | '\u000C' // Form Feed
30: | '\u0020' // Space
31: | NewLineCharacter;
32:
33: token NewLineCharacter
34: = '\u000A' // New Line
35: | '\u000D' // Carriage Return
36: | '\u0085' // Next Line
37: | '\u2028' // Line Separator
38: | '\u2029'; // Paragraph Separator
39: }
Now, in your language you can use the Skippable token from this language to set as an interleave; this will let your language skip whitespacing and comments.
The Email Language
I wanted to have some sort of Email address validation within my language. So I came up with this:
1: language EmailAddressLanguage
2: {
3: token EmailAddress =
4: localpart
5: at
6: domainpart;
7:
8: token abzABZ = ('A'..'Z' | 'a'..'z')+;
9: token digits = ('0'..'9')+;
10: token otherChars = ('!' | '#' | '$' | '%' | '&' | "'" | '*' | '+' | '-' | '/' | '=' | '?' | '^' | '_' | '`' | '{' | '|' | '}' | '~')+;
11: token allButDot = (abzABZ | digits | otherChars)+;
12: token all = (allButDot | dot)+;
13: token dot = ('.')#1;
14:
15: token localpart =
16: (allButDot)+ |
17: allButDot dot all* allButDot+;
18:
19: token at = "@";
20:
21: token domainpart =
22: (allButDot)+ dot all* allButDot+;
23: }
It’s far from being perfect! It validates email addresses but in some cases doesn’t work correctly yet:
You can have an email address like “bla..bla@hotmail..com” and it will validate. I haven’t looked much deeper in it yet, because this was just a small test but if someone feels like improving this part, please do so and post a comment with your solution!
The RSS Language
Then, I wrote the RSS language itself, which looks like this:
1: language RssLanguage
2: {
3: syntax Main = f:Feeds => f;
4:
5: syntax Feeds = Feed*;
6:
7: syntax Feed =
8: "Title" ":" t:Title
9: "Url" ":" u:Url
10: "RssFeedUrl" ":" r:RssFeedUrl
11: "Email" ":" e:EmailAddressLanguage.EmailAddress
12: =>
13: Feed{
14: Title{t},
15: Url{u},
16: RSS{r},
17: Email{e}
18: };
19:
20: @{Classification["Keyword"]} token Title = ('A'..'Z' | 'a'..'z' | '.')+;
21:
22: token Url = "http://" ('A'..'Z' | 'a'..'z' | '.' | '/')+;
23:
24: token RssFeedUrl = Url;
25:
26:
27: interleave WhiteSpacing = " " | "\r" | "\n";
28: interleave Skippable = InwitCommon.Skippable;
29: }
It defined that the Main is a sequence called ‘Feeds’ which contains items of the type Feed. An input Feed will consist of a Title, Url, RssFeedUrl and Email and will be shaped to a Feed with a Title, Url, RSS and Email element.
You can see that I use the EmailAddressLanguage and the InwitCommon language within this language.
Full Listing
To simplify, here is the full listing in one module:
1: module inwit
2: {
3: language RssLanguage
4: {
5: syntax Main = f:Feeds => f;
6:
7: syntax Feeds = Feed*;
8:
9: syntax Feed =
10: "Title" ":" t:Title
11: "Url" ":" u:Url
12: "RssFeedUrl" ":" r:RssFeedUrl
13: "Email" ":" e:EmailAddressLanguage.EmailAddress
14: =>
15: Feed{
16: Title{t},
17: Url{u},
18: RSS{r},
19: Email{e}
20: };
21:
22: @{Classification["Keyword"]} token Title = ('A'..'Z' | 'a'..'z' | '.')+;
23:
24: token Url = "http://" ('A'..'Z' | 'a'..'z' | '.' | '/')+;
25:
26: token RssFeedUrl = Url;
27:
28:
29: interleave WhiteSpacing = " " | "\r" | "\n";
30: interleave Skippable = InwitCommon.Skippable;
31: }
32:
33: language EmailAddressLanguage
34: {
35: token EmailAddress =
36: localpart
37: at
38: domainpart;
39:
40: token abzABZ = ('A'..'Z' | 'a'..'z')+;
41: token digits = ('0'..'9')+;
42: token otherChars = ('!' | '#' | '$' | '%' | '&' | "'" | '*' | '+' | '-' | '/' | '=' | '?' | '^' | '_' | '`' | '{' | '|' | '}' | '~')+;
43: token allButDot = (abzABZ | digits | otherChars)+;
44: token all = (allButDot | dot)+;
45: token dot = ('.')#1;
46:
47: token localpart =
48: (allButDot)+ |
49: allButDot dot all* allButDot+;
50:
51: token at = "@";
52:
53: token domainpart =
54: (allButDot)+ dot all* allButDot+;
55: }
56:
57: language InwitCommon
58: {
59:
60: token Skippable = Whitespace | Comment;
61:
62: token Comment = CommentToken;
63: token CommentToken
64: = CommentDelimited
65: | CommentLine;
66: token CommentDelimited = "/*" CommentDelimitedContent* "*/";
67: token CommentDelimitedContent =
68: ^('*')
69: | '*' ^('/');
70: token CommentLine = "//" CommentLineContent*;
71: token CommentLineContent = ^(
72: '\u000A' // New Line
73: | '\u000D' // Carriage Return
74: | '\u0085' // Next Line
75: | '\u2028' // Line Separator
76: | '\u2029'); // Paragraph Separator
77:
78:
79:
80: token Whitespace = WhitespaceToken+;
81: token WhitespaceToken = WhitespaceCharacter+;
82: token WhitespaceCharacter
83: = '\u0009' // Horizontal Tab
84: | '\u000B' // Vertical Tab
85: | '\u000C' // Form Feed
86: | '\u0020' // Space
87: | NewLineCharacter;
88:
89: token NewLineCharacter
90: = '\u000A' // New Line
91: | '\u000D' // Carriage Return
92: | '\u0085' // Next Line
93: | '\u2028' // Line Separator
94: | '\u2029'; // Paragraph Separator
95: }
96: }
And this is what it looks like when writing it within Intellipad:
Language Compilation
Next step, is to compile the module ‘RSSLanguage.mg’ I just created; we use the mg.exe compiler provided by the Oslo SDK to do this:
We get an .MGX file out of this. When renamed to a file with a .ZIP extension, I tried to open this file but it’s password protected. Anyone knows the secret password? :)
Run-time Language utilization
Last but not least I’d like to use my language within the .NET runtime. Luckily, the Oslo SDK provides us some base classes to do this. I created a new C# Console Application to test test things out.
First add references to the System.Dataflow and Microsoft.M.Grammar assemblies which can be found within the Bin directory of the Oslo SDK.:
Then, I wrote this code:
1: using System;
2: using System.Collections.Generic;
3: using System.Linq;
4: using System.Text;
5: using System.Dataflow; // DynamicParser, GraphBuilder
6: using Microsoft.M.Grammar; // MGrammarCompiler
7:
8: namespace ConsoleApplication
9: {
10: class Program
11: {
12: static void Main(string[] args)
13: {
14: try
15: {
16: string imageFileName = @"C:\Users\Robert Jan\Desktop\My Documents\Oslo\MyOslo\ConsoleApplication\RssLanguage.mgx";
17: string inputFileName = @"C:\Users\Robert Jan\Desktop\My Documents\Oslo\MyOslo\ConsoleApplication\FeedsInput.m";
18: //inwit == module name
19: //RssLanguage == language name
20: string parserName = "inwit.RssLanguage";
21:
22: DynamicParser parser = MGrammarCompiler.LoadParserFromMgx(imageFileName, parserName);
23:
24: object output = parser.ParseObject(inputFileName, ErrorReporter.Standard);
25:
26: Helper.WalkMGraphTree(output);
27:
28: }
29: catch (Exception e)
30: {
31: Console.WriteLine(e.Message);
32: }
33: Console.ReadLine();
34: }
35: }
36: }
First, I Create a DynamicParser instance, and provide it with the compiled language image file (the .MGX file) and with the parserName. The parser name is the name of the module and the name of the language concatenated.
I then parse the input file using the ParseObject method, and we will get the result.
I wrote a nice Helper function that walks the result tree, and outputs its contents to the Console. Feel free to use it yourself (after giving me a comment here of course :)).
1: using System;
2: using System.Collections.Generic;
3: using System.Linq;
4: using System.Text;
5: using System.Dataflow;
6:
7: namespace ConsoleApplication
8: {
9: class Helper
10: {
11:
12: public static void WalkMGraphTree(object rootNode)
13: {
14: IGraphBuilder builder = new GraphBuilder();
15: WalkNode(rootNode, builder);
16:
17: }
18: private static void WalkNode(object node, IGraphBuilder builder)
19: {
20: if (node.GetType().Name == "SequenceNode")
21: {
22: foreach (object sequenceElement in builder.GetSequenceElements(node))
23: {
24:
25: WalkNode(sequenceElement, builder);
26: }
27: Console.WriteLine();
28: }
29: else if (node.GetType().Name == "SimpleNode")
30: {
31: Identifier id = builder.GetLabel(node) as Identifier;
32: WriteLine(id.Text,false);
33: foreach (object successorElement in builder.GetSuccessors(node))
34: {
35: WalkNode(successorElement, builder);
36: }
37: Console.WriteLine();
38: }
39: else
40: {
41: WriteLine(Convert.ToString(node),true);
42: }
43: }
44:
45: private static void WriteLine(string line, bool newline)
46: {
47: Console.Write(line + " ");
48: if (newline)
49: {
50: Console.Write(Environment.NewLine);
51: }
52: }
53:
54: }
55: }
Now when I run the Console App, the output looks like this:
Summary
Here’s the summary of the steps I took, and the end result accomplished:
- First, we created our languages; we separated some functionalities in separate languages and used these within the RssLanguage
- We created some input data and tested the languages combined with the input data within Intellipad
- We compiled the languages with MG.exe into an .MGX image file.
- We created a .NET applications which loads the image file and parses the input data through the language.
- We created a Helper method which walks the result graph tree, and shows us the result within our Console.
Valuable links
Steef-Jan gave some pretty good links last Monday, I’d like to highlight one of those and give you two others:
Go and read what Martin Fowler has to say about Oslo and also check out what MSDN has to say about MGrammar:
Currently rated 5.0 by 2 people
- Currently 5/5 Stars.
- 1
- 2
- 3
- 4
- 5