V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
amazing994
V2EX  ›  Python

45 行 Python 代码写一个语言检测器

  •  
  •   amazing994 · 2018-01-11 16:24:29 +08:00 · 1726 次点击
    这是一个创建于 2268 天前的主题,其中的信息可能已经有所发展或是发生改变。
    class NGram(object):
    def __init__(self, text, n=3):
    self.length = None
    self.n = n
    self.table = {}
    self.parse_text(text)
    self.calculate_length()

    def parse_text(self, text):
    chars = ' ' * self.n # initial sequence of spaces with length n

    for letter in (" ".join(text.split()) + " "):
    chars = chars[1:] + letter # append letter to sequence of length n
    self.table[chars] = self.table.get(chars, 0) + 1 # increment count

    def calculate_length(self):
    """ Treat the N-Gram table as a vector and return its scalar magnitude
    to be used for performing a vector-based search.
    """
    self.length = sum([x * x for x in self.table.values()]) ** 0.5
    return self.length

    def __sub__(self, other):
    """ Find the difference between two NGram objects by finding the cosine
    of the angle between the two vector representations of the table of
    N-Grams. Return a float value between 0 and 1 where 0 indicates that
    the two NGrams are exactly the same.
    """
    if not isinstance(other, NGram):
    raise TypeError("Can't compare NGram with non-NGram object.")

    if self.n != other.n:
    raise TypeError("Can't compare NGram objects of different size.")

    total = 0
    for k in self.table:
    total += self.table[k] * other.table.get(k, 0)

    return 1.0 - (float(total) )/ (float(self.length) * float(other.length))

    def find_match(self, languages):
    """ Out of a list of NGrams that represent individual languages, return
    the best match.
    """
    return min(languages, lambda n: self - n)


    更多代码请扣 1132032275
    目前尚无回复
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   我们的愿景   ·   实用小工具   ·   996 人在线   最高记录 6543   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 25ms · UTC 20:02 · PVG 04:02 · LAX 13:02 · JFK 16:02
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.