V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
KarlRixon
V2EX  ›  TensorFlow

word2vec+LSTM 情感识别,帮忙看看哪出了问题

  •  
  •   KarlRixon · 2019-02-24 10:42:10 +08:00 · 6533 次点击
    这是一个创建于 1859 天前的主题,其中的信息可能已经有所发展或是发生改变。

    数据来源是京东新款手机的评论和打分,目前收集到 2500 条数据,但打分小于 5 分的只有不到 40 条

    训练模型层次是:嵌入层-》 LSTM-》 Dense-》 Dense-》输出层 嵌入层的初始数据为 word2vec 训练的词向量 输入的训练数据为词索引,标记为打分

    部分代码如下:

    def main():
        x_train = pad_seq()
        y_train = star()
        x_train, y_train, x_test, y_test = set_data(x_train, y_train)
        model = Sequential()
        model.add(Embedding(input_dim=input_dim+1, output_dim=output_dim, input_length=k, embeddings_initializer=my_init))
        model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2, activation='sigmoid'))
        model.add(Dense(256, activation='relu'))
        model.add(Dense(128, activation='relu'))
        model.add(Dense(1,activation='sigmoid'))
        model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
        model.fit(x_train, y_train, batch_size=batch_size, epochs=15)
        score, acc = model.evaluate(x_test, y_test, batch_size=batch_size)
        print('Test score:', score)
        print('Test accuracy:', acc)
    

    然而训练结果是这样的。。。

    Epoch 15/15
    
      32/2015 [..............................] - ETA: 0s - loss: -63.2713 - acc: 0.0000e+00
     160/2015 [=>............................] - ETA: 0s - loss: -63.4706 - acc: 0.0000e+00
     288/2015 [===>..........................] - ETA: 0s - loss: -63.5481 - acc: 0.0000e+00
     384/2015 [====>.........................] - ETA: 0s - loss: -63.2713 - acc: 0.0026    
     480/2015 [======>.......................] - ETA: 0s - loss: -63.3046 - acc: 0.0021
     608/2015 [========>.....................] - ETA: 0s - loss: -63.4024 - acc: 0.0016
     736/2015 [=========>....................] - ETA: 0s - loss: -63.3796 - acc: 0.0027
     864/2015 [===========>..................] - ETA: 0s - loss: -63.3821 - acc: 0.0023
     992/2015 [=============>................] - ETA: 0s - loss: -63.3838 - acc: 0.0020
    1120/2015 [===============>..............] - ETA: 0s - loss: -63.3852 - acc: 0.0018
    1248/2015 [=================>............] - ETA: 0s - loss: -63.3991 - acc: 0.0016
    1376/2015 [===================>..........] - ETA: 0s - loss: -63.4104 - acc: 0.0015
    1504/2015 [=====================>........] - ETA: 0s - loss: -63.3879 - acc: 0.0020
    1632/2015 [=======================>......] - ETA: 0s - loss: -63.4081 - acc: 0.0018
    1760/2015 [=========================>....] - ETA: 0s - loss: -63.4344 - acc: 0.0017
    1888/2015 [===========================>..] - ETA: 0s - loss: -63.4233 - acc: 0.0021
    2015/2015 [==============================] - 1s 470us/step - loss: -63.4214 - acc: 0.0020
    
     32/504 [>.............................] - ETA: 2s
    504/504 [==============================] - 0s 412us/step
    Test score: -63.769539061046785
    Test accuracy: 0.0
    
    Process finished with exit code 0
    

    model 如下:

    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    embedding_1 (Embedding)      (None, 20, 50)            49350     
    _________________________________________________________________
    lstm_1 (LSTM)                (None, 128)               91648     
    _________________________________________________________________
    dense_1 (Dense)              (None, 256)               33024     
    _________________________________________________________________
    dense_2 (Dense)              (None, 128)               32896     
    _________________________________________________________________
    dense_3 (Dense)              (None, 1)                 129       
    =================================================================
    Total params: 207,047
    Trainable params: 207,047
    Non-trainable params: 0
    _________________________________________________________________
    
    10 条回复    2019-02-25 18:21:45 +08:00
    wz74666291
        1
    wz74666291  
       2019-02-24 12:35:28 +08:00 via iPhone
    你试着把 fit 的 step 减小一些试试,比如 10e-5
    hanbing135
        2
    hanbing135  
       2019-02-24 14:17:12 +08:00 via Android
    完全看不懂
    aREMbosAl
        3
    aREMbosAl  
       2019-02-24 14:22:51 +08:00
    label 是打分?但是用的 loss 是二分类的
    ayase252
        4
    ayase252  
       2019-02-24 15:19:10 +08:00
    loss 为负?这违反代价函数定义了吧。
    仔细考虑一下输出是什么
    douglas1997
        5
    douglas1997  
       2019-02-24 15:52:42 +08:00
    我就想知道 Epoch 15 的时候后面的 Acc 指标为什么一直是 0.001~0.002.... 压根就没有优化好。
    zzj0311
        6
    zzj0311  
       2019-02-24 16:11:39 +08:00 via Android
    loss 是负的是什么鬼。。这拿啥写的,sklearn ?
    zzj0311
        7
    zzj0311  
       2019-02-24 16:14:48 +08:00 via Android
    所以你这个输入是评分输出是一个二分类?那不可能对的嘛
    KarlRixon
        8
    KarlRixon  
    OP
       2019-02-25 18:18:20 +08:00
    @zzj0311 用的 Keras
    KarlRixon
        9
    KarlRixon  
    OP
       2019-02-25 18:20:27 +08:00
    @zzj0311 我想起了,1-3 分是负面评价,4-5 分是正面评价
    KarlRixon
        10
    KarlRixon  
    OP
       2019-02-25 18:21:45 +08:00
    @ayase252 1-3 分是负面评价,4-5 分是正面评价,可能是我少了这步处理,直接把打分喂入了
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   我们的愿景   ·   实用小工具   ·   964 人在线   最高记录 6543   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 33ms · UTC 20:43 · PVG 04:43 · LAX 13:43 · JFK 16:43
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.