【问题标题】:Get central content from a Web Page从网页获取中心内容
【发布时间】:2011-05-28 08:26:46
【问题描述】:

获取网页中心内容的可能方法有哪些?

中心内容是指页面中最重要的内容。

例如:在网页中http://techcrunch.com/2011/05/27/iphone-app-notifies-you-when-your-laundrys-done/

中心内容是:

<p><img src="http://tctechcrunch.files.wordpress.com/2011/05/screen-shot-2011-05-27-at-10-11-36-pm.png" alt=""><br>
The folks that brought you <a href="http://itsthisforthat.com/">It’sthisforthat</a> have created another way to make your life just a little bit easier and funnier. Meet&nbsp;<a href="http://www.dryerbro.com">DryerBro</a>, an app that uses an accelerometer to let you know when your laundry’s done.</p>
<p>With DryerBro you put your iPhone or iTouch on your laundry machine and it texts you and the remaining members of your laundry party when your laundry’s done. I’m thinking this is going to be HUGE. I mean Facebook took off at colleges right?</p>
<p>Once set up, DryerBro uses an accelerometer and Twilio to send a SMS, email or call to multiple phones when your unmentionables are ready to be picked up.</p>
<p>Says creator Eric Kerr, “We live in a house with 11 dudes, and we’re seriously unorganized about laundry. We all want to use the machine on the weekends, but no one ever knows when the last load was done. It bothered me as hackers that we had the tools (accelerometer, Twilio) to solve the problem, but didn’t do anything about it.”</p>
<p>So they built DryerBro. “We originally looked to see if an app already used the accelerometer to detect when your laundry is done but we couldn’t find anything – it’s a blue ocean strategy,” he says.</p>
<p>Kerr and company are completely ridiculous, but their thing apparently works. When asked about future plans for DryerBro he told TechCrunch:</p>
<p>“Ultimately we want to build out a hyper-local group buying ad platform for laundry detergents. Rough back of the napkin calculations indicate that we’d need roughly $41 million in financing, so we’re asking friends and family to help pony up the dough. We also want to build out the map of every active dryer in the world to hang on the wall of our office.”</p>
<p>Both the DryerBro<a href="http://dryerbro.com/"> FAQ</a> and Promo video are awesome. You can download the iPhone <a href="http://itunes.apple.com/us/app/dryer-bro/id425920156?mt=8">app here.</a>&nbsp;Promo video below.</p>
<div style="text-align:center;">
<object type="application/x-shockwave-flash" width="620" height="300" data="http://www.vimeo.com/moogaloop.swf?clip_id=20732587&amp;server=www.vimeo.com&amp;fullscreen=1&amp;show_title=1&amp;show_byline=0&amp;show_portrait=0&amp;color=01AAEA">
<param name="quality" value="best">
<param name="allowfullscreen" value="true">
<param name="scale" value="showAll">
<param name="movie" value="http://www.vimeo.com/moogaloop.swf?clip_id=20732587&amp;server=www.vimeo.com&amp;fullscreen=1&amp;show_title=1&amp;show_byline=0&amp;show_portrait=0&amp;color=01AAEA">
<param name="wmode" value="opaque">
</object>
</div>

这方面的任何指示都会有所帮助。

谢谢

【问题讨论】:

  • 你必须定义如何识别重要的内容,是什么让内容变得重要
  • 这就是问题的重点,是否有任何 NLP 方法可以获取内容中最重要的部分。一种方法是将其与类似的网页进行比较,并删除常见的部分,如标题、菜单、侧边栏等,但它不是 NLP。

标签: parsing nlp semantic-markup


【解决方案1】:

metaoptimize 对此主题进行了广泛讨论。

其中一个帖子指向 list of resourcesan overview of approaches,都在 Tomaz Kovacic 的博客上,而且都非常好。

【讨论】:

    【解决方案2】:

    我认为您需要automatic summarisation 它是通过描述的算法从文本中提取大多数中心句子。例如它是如何工作的,您可以查看my implementation 的几种算法

    【讨论】:

      猜你喜欢
      • 2010-10-27
      • 1970-01-01
      • 2016-09-17
      • 2016-09-15
      • 1970-01-01
      • 1970-01-01
      • 2011-01-23
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多