<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>GistWeb Blog</title>
	<atom:link href="http://gistweb.com/blog/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://gistweb.com/blog</link>
	<description></description>
	<pubDate>Sat, 14 Feb 2009 02:11:21 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5</generator>
	<language>en</language>
			<item>
		<title>Gist News now provides updated news summaries.</title>
		<link>http://gistweb.com/blog/?p=15</link>
		<comments>http://gistweb.com/blog/?p=15#comments</comments>
		<pubDate>Sat, 14 Feb 2009 02:11:21 +0000</pubDate>
		<dc:creator>jonathanleger</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://gistweb.com/blog/?p=15</guid>
		<description><![CDATA[I&#8217;ve created a service on GistWeb that creates summaries of top news stories every 2 hours in the following categories:
1. World News
2. U.S. News
3. Business News
4. Sci/Tech News
5. Sports
6. Entertainment
7. Health
These are summaries of up to 30 articles about each news item, which help you get a much more well-rounded idea of what&#8217;s going on [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve created a service on GistWeb that creates summaries of top news stories every 2 hours in the following categories:</p>
<p>1. World News</p>
<p>2. U.S. News</p>
<p>3. Business News</p>
<p>4. Sci/Tech News</p>
<p>5. Sports</p>
<p>6. Entertainment</p>
<p>7. Health</p>
<p>These are summaries of up to 30 articles about each news item, which help you get a much more well-rounded idea of what&#8217;s going on than if you just read one article slanted to a particular opinion.</p>
<p><a href="http://gistweb.com/news.php">Click here to see the new news summaries.</a></p>
<p>Also, I&#8217;ve updated the GistLite algorithm so that it&#8217;s a lot better and reads a lot smoother in most instances.  That change effects both Gist Web Summaries and the new Gist News service.</p>
]]></content:encoded>
			<wfw:commentRss>http://gistweb.com/blog/?feed=rss2&amp;p=15</wfw:commentRss>
		</item>
		<item>
		<title>Web summaries are back online, require IE.</title>
		<link>http://gistweb.com/blog/?p=14</link>
		<comments>http://gistweb.com/blog/?p=14#comments</comments>
		<pubDate>Fri, 13 Feb 2009 05:29:50 +0000</pubDate>
		<dc:creator>jonathanleger</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://gistweb.com/blog/?p=14</guid>
		<description><![CDATA[Okay guys.  Lots of you begged me to get GistWeb summaries working again.  It took some hacking, but it&#8217;s up and running again.
The gotcha is that you have to use Internet Explorer, and you have to set gistweb.com as a &#8220;Trusted Site&#8221; in IE.  To do this, go to the Tools -&#62; Internet Options menu.  [...]]]></description>
			<content:encoded><![CDATA[<p>Okay guys.  Lots of you begged me to get GistWeb summaries working again.  It took some hacking, but it&#8217;s up and running again.</p>
<p>The gotcha is that you have to use Internet Explorer, and you have to set gistweb.com as a &#8220;Trusted Site&#8221; in IE.  To do this, go to the Tools -&gt; Internet Options menu.  On the Security tab, click the Trusted Sites icon, and set the security level to Low (the lowest setting).</p>
<p>Then click the Sites button and add gistweb.com as a trusted site.</p>
<p>This is necessary in order for GistWeb to use JavaScript to fetch the search engine results it needs to work its magic.  I hate that I had to take that route, but it&#8217;s the only way I could come up with that didn&#8217;t require making GistWeb a desktop application (which means Windows-only users, which I didn&#8217;t want for this app).</p>
<p>I promise I won&#8217;t be making calls to any rogue sites via javascript and your browser. <img src='http://gistweb.com/blog/wp-includes/images/smilies/icon_biggrin.gif' alt=':D' class='wp-smiley' /></p>
]]></content:encoded>
			<wfw:commentRss>http://gistweb.com/blog/?feed=rss2&amp;p=14</wfw:commentRss>
		</item>
		<item>
		<title>Web summaries are back online.</title>
		<link>http://gistweb.com/blog/?p=13</link>
		<comments>http://gistweb.com/blog/?p=13#comments</comments>
		<pubDate>Wed, 10 Sep 2008 19:32:33 +0000</pubDate>
		<dc:creator>jonathanleger</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://gistweb.com/blog/?p=13</guid>
		<description><![CDATA[Just an FYI everybody.  The GistWeb web summaries are not back up and running.  There was a problem with fetching search results that has been resolved.  Hopefully it will not occur again.
I apologize for the long delay in the fix, but with my newborn son here and GistWeb being a free resource, I had to [...]]]></description>
			<content:encoded><![CDATA[<p>Just an FYI everybody.  The GistWeb web summaries are not back up and running.  There was a problem with fetching search results that has been resolved.  Hopefully it will not occur again.</p>
<p>I apologize for the long delay in the fix, but with my newborn son here and GistWeb being a free resource, I had to focus my limited available work-time on my paying customer products.</p>
]]></content:encoded>
			<wfw:commentRss>http://gistweb.com/blog/?feed=rss2&amp;p=13</wfw:commentRss>
		</item>
		<item>
		<title>Choosing between GistLite and GistPro.</title>
		<link>http://gistweb.com/blog/?p=10</link>
		<comments>http://gistweb.com/blog/?p=10#comments</comments>
		<pubDate>Wed, 23 Apr 2008 22:27:41 +0000</pubDate>
		<dc:creator>jonathanleger</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<category><![CDATA[document summaries]]></category>

		<category><![CDATA[gistlite]]></category>

		<category><![CDATA[gistpro]]></category>

		<guid isPermaLink="false">http://gistweb.com/blog/?p=10</guid>
		<description><![CDATA[GistLite
GistLite is the original algorithm used by GistWeb to create the multi-document summaries.  It works well on a large variety of keywords.
Choose GistLite as the algorithm for your summary if your keywords are not likely to return a lot of textual content (e.g. highly commercial keywords).  GistLite will do a better job with more text [...]]]></description>
			<content:encoded><![CDATA[<p><strong>GistLite</strong></p>
<p>GistLite is the original algorithm used by GistWeb to create the multi-document summaries.  It works well on a large variety of keywords.</p>
<p>Choose GistLite as the algorithm for your summary if your keywords are not likely to return a lot of textual content (e.g. highly commercial keywords).  GistLite will do a <em>better</em> job with more text (e.g. less commercial keywords), but if you do run keywords that are more commercial and less informational, try GistLite first.</p>
<p><strong>GistPro</strong></p>
<p>GistPro is far better at selecting what information belongs in the summary, and far better at organizing that information into groups of related material.  However, it does require that your keywords return documents that contain a good quantity of <em>information</em> (as opposed to ad text).</p>
<p>A couple of examples of keywords that GistPro does a fantastic job with are:</p>
<p><a href="http://gistweb.com/web.php?action=searchpro&amp;source=google&amp;q=Golden+Retriever&amp;sumsize=2" target="_blank">Golden Retriever</a></p>
<p><a href="http://gistweb.com/web.php?action=searchpro&amp;source=google&amp;q=osteoporosis&amp;sumsize=2" target="_blank">Osteoporosis</a></p>
<p><strong>Either Way, No Worries</strong></p>
<p>Whichever algorithm you decide to use, don&#8217;t sweat it!  It&#8217;s a one-click operation to switch to the other algorithm from the summary  page.  I posted these guidelines just to save you a little hassle if you are pretty sure which algorithm will work best for your keywords.</p>
]]></content:encoded>
			<wfw:commentRss>http://gistweb.com/blog/?feed=rss2&amp;p=10</wfw:commentRss>
		</item>
		<item>
		<title>Summaries algo update yields superior results.</title>
		<link>http://gistweb.com/blog/?p=9</link>
		<comments>http://gistweb.com/blog/?p=9#comments</comments>
		<pubDate>Tue, 22 Apr 2008 23:34:00 +0000</pubDate>
		<dc:creator>jonathanleger</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://gistweb.com/blog/?p=9</guid>
		<description><![CDATA[I was looking at the results from GistWeb&#8217;s new search summaries today, and I noticed that the first paragraph was very often spot on target, and that the rest of the pargraphs, while generally very good, didn&#8217;t meet up with the same level of quality as the introductory paragraph.
&#8216;Well,&#8217; I said, &#8216;I&#8217;ll just apply the [...]]]></description>
			<content:encoded><![CDATA[<p>I was looking at the results from GistWeb&#8217;s new search summaries today, and I noticed that the first paragraph was very often spot on target, and that the rest of the pargraphs, while generally very good, didn&#8217;t meet up with the same level of quality as the introductory paragraph.</p>
<p>&#8216;Well,&#8217; I said, &#8216;I&#8217;ll just apply the same algorithm I&#8217;m using on the intro pargraph to the rest of the article and see what happens!&#8217;</p>
<p>The results were so good that I made the algo change live.  I&#8217;m seeing much less ad content, and far more targeting results in the latter paragraphs of the articles nows.</p>
<p>Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://gistweb.com/blog/?feed=rss2&amp;p=9</wfw:commentRss>
		</item>
		<item>
		<title>GistWeb now creates web search summaries!</title>
		<link>http://gistweb.com/blog/?p=8</link>
		<comments>http://gistweb.com/blog/?p=8#comments</comments>
		<pubDate>Tue, 22 Apr 2008 05:38:56 +0000</pubDate>
		<dc:creator>jonathanleger</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://gistweb.com/blog/?p=8</guid>
		<description><![CDATA[Okay guys, this is BIG news.  I&#8217;ve added a multi-document summarizer to GistWeb.  It will take the top 10, 20 or 30 pages resulting from a search query and extract only the most important information from each page and display it in a remarkably readable summary.
Here&#8217;s the link:
http://gistweb.com/web.php
It will create summaries from Google, Google News [...]]]></description>
			<content:encoded><![CDATA[<p>Okay guys, this is BIG news.  I&#8217;ve added a multi-document summarizer to GistWeb.  It will take the top 10, 20 or 30 pages resulting from a search query and extract only the most important information from each page and display it in a remarkably readable summary.</p>
<p>Here&#8217;s the link:</p>
<p><a href="http://gistweb.com/web.php">http://gistweb.com/web.php</a></p>
<p>It will create summaries from Google, Google News or Google News Archives.</p>
<p>Let me know what you think!</p>
]]></content:encoded>
			<wfw:commentRss>http://gistweb.com/blog/?feed=rss2&amp;p=8</wfw:commentRss>
		</item>
		<item>
		<title>Update ensures quotes are kept in tact.</title>
		<link>http://gistweb.com/blog/?p=7</link>
		<comments>http://gistweb.com/blog/?p=7#comments</comments>
		<pubDate>Sat, 19 Apr 2008 13:42:16 +0000</pubDate>
		<dc:creator>jonathanleger</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://gistweb.com/blog/?p=7</guid>
		<description><![CDATA[Dr. Wellman said that the virus was &#8220;very dangerous to the preservation of the human race.  It has the potential to wipe out the world!&#8221;
Before this update, the above sentence would get split at &#8220;human race&#8221;, with GistWeb believing that to be the end of the sentence.  This new update allows the algorithm to recognize [...]]]></description>
			<content:encoded><![CDATA[<p><em>Dr. Wellman said that the virus was &#8220;very dangerous to the preservation of the human race.  It has the potential to wipe out the world!&#8221;</em></p>
<p>Before this update, the above sentence would get split at &#8220;human race&#8221;, with GistWeb believing that to be the end of the sentence.  This new update allows the algorithm to recognize when full or partial sentences are contained in quotes, and keeps them in tact.</p>
<p>This also helps if an article quotes an individual saying multiple sentences.  Instead of breaking the quote off after the first sentence or two, it will now keep the full quote in tact.  Since quotes are an important part of any document, this improvement helps make the gists more informative, as well as more readable.</p>
]]></content:encoded>
			<wfw:commentRss>http://gistweb.com/blog/?feed=rss2&amp;p=7</wfw:commentRss>
		</item>
		<item>
		<title>Modified text-culling algo for paragraph-by-paragraph analysis.</title>
		<link>http://gistweb.com/blog/?p=6</link>
		<comments>http://gistweb.com/blog/?p=6#comments</comments>
		<pubDate>Fri, 18 Apr 2008 13:27:29 +0000</pubDate>
		<dc:creator>jonathanleger</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://gistweb.com/blog/?p=6</guid>
		<description><![CDATA[Prior to now, GistWeb analyzed the text of an entire page to decide how much it should cut out.  While this obviously worked quite well, I&#8217;ve made it even better.
Now GistWeb looks at each paragraph in the text, and makes a decision on a paragraph-by-paragraph basis.  This has resulted in, I feel, a significant improvement [...]]]></description>
			<content:encoded><![CDATA[<p>Prior to now, GistWeb analyzed the text of an entire page to decide how much it should cut out.  While this obviously worked quite well, I&#8217;ve made it even better.</p>
<p>Now GistWeb looks at each paragraph in the text, and makes a decision on a paragraph-by-paragraph basis.  This has resulted in, I feel, a significant improvement in the amount of &#8220;fluff&#8221; being cut out.</p>
]]></content:encoded>
			<wfw:commentRss>http://gistweb.com/blog/?feed=rss2&amp;p=6</wfw:commentRss>
		</item>
		<item>
		<title>GistWeb now shows significant breaks in the text.</title>
		<link>http://gistweb.com/blog/?p=5</link>
		<comments>http://gistweb.com/blog/?p=5#comments</comments>
		<pubDate>Thu, 17 Apr 2008 20:43:02 +0000</pubDate>
		<dc:creator>jonathanleger</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://gistweb.com/blog/?p=5</guid>
		<description><![CDATA[I noticed that when gisting blogs, GistWeb was lumping the comments in with the rest of the text.  This could be a bad thing, because a comment might be written in such a way that &#8212; when viewed in GistWeb &#8212; it appears to actually be part of the article.  That would be misleading, so [...]]]></description>
			<content:encoded><![CDATA[<p>I noticed that when gisting blogs, GistWeb was lumping the comments in with the rest of the text.  This could be a bad thing, because a comment might be written in such a way that &#8212; when viewed in GistWeb &#8212; it appears to actually be part of the article.  That would be misleading, so I wanted to fix it.</p>
<p>I&#8217;ve introduced logic into the parser that adds a break (in the form of an &lt;hr&gt; tag) in between content that has significant HTML &#8220;breaks&#8221; between the flow of the text.  Thus blog comments now appear in a speparated fashion.</p>
<p>So far it looks like it&#8217;s working very well, although if an article inserts a block of ads or anything else very significant in the middle of their text (as many sites do), that will also be indicated by a break mark (&lt;hr&gt; tag).</p>
<p>From the test results, I feel this is a great benefit to the algorithm overall, and so it&#8217;s live in the system now.</p>
]]></content:encoded>
			<wfw:commentRss>http://gistweb.com/blog/?feed=rss2&amp;p=5</wfw:commentRss>
		</item>
		<item>
		<title>Major improvement in text-parsing algorithm.</title>
		<link>http://gistweb.com/blog/?p=4</link>
		<comments>http://gistweb.com/blog/?p=4#comments</comments>
		<pubDate>Thu, 17 Apr 2008 19:10:50 +0000</pubDate>
		<dc:creator>jonathanleger</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://gistweb.com/blog/?p=4</guid>
		<description><![CDATA[Okay guys, your comments on my first post got me focused on a way to improve the text-parsing algorithm that finds the &#8220;meat&#8221; on any page.
A number of folks commented that, for blogs they run through GistWeb, they were seeing parts of the navigation, footer, etc. come out in the content.
That got me to sit [...]]]></description>
			<content:encoded><![CDATA[<p>Okay guys, your comments on my first post got me focused on a way to improve the text-parsing algorithm that finds the &#8220;meat&#8221; on any page.</p>
<p>A number of folks commented that, for blogs they run through GistWeb, they were seeing parts of the navigation, footer, etc. come out in the content.</p>
<p>That got me to sit down, examine a bunch of the results, and start brain storming a better way to aproach the text-parsing algorithm.  I almost completely rewrote that part of the algorithm this afternoon.</p>
<p>The results are vastly superior, and not just for blogs, but for web sites of all kinds.  Try it out and you&#8217;ll see the difference.  Keep in mind, though, that if you try to rerun a page through GistWeb that was run less than 24 hours ago, the cache will be used and you won&#8217;t see any changes.</p>
<p>Thanks for the great feedback guys!  Keep it coming.</p>
]]></content:encoded>
			<wfw:commentRss>http://gistweb.com/blog/?feed=rss2&amp;p=4</wfw:commentRss>
		</item>
	</channel>
</rss>

