supman
V2EX  ›  问与答

问一个面试题。。

  •  
  •   supman · Apr 20, 2015 · 1926 views
    This topic created in 4068 days ago, the information mentioned may be changed or developed.

    给一段输入文字,统计所有2-gram及出现次数。dataset有100G怎么办?你有100台32-bit机器(4G内存),怎么分发给100台机器处理?瓶颈在哪里?

    1 replies    2015-04-21 17:15:50 +08:00
    mengzhuo
        1
    mengzhuo  
       Apr 21, 2015 via iPhone
    不是排序就简单了
    按id分呗
    100台分别一台一个g
    然后分别搜索
    加上启示的index
    搞定
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   1038 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 27ms · UTC 19:05 · PVG 03:05 · LAX 12:05 · JFK 15:05
    ♥ Do have faith in what you're doing.