모델 학습 기술
과소적합과 과대적합
IMDB 딥러닝 모델 예제
전체 배치(Full Batch)
Stochastic
Mini Batch(미니 배치)
Q. 전체 배치 방식이 있음에도, 미니 배치를 사용하는 이유
A. 딥러닝의 경우, 대용량의 데이터를 사용하기 때문에 모든 데이터를 한번에 로드해서 학습하게되면 리소스 낭비가 크기 때문
데이터 스케일이 다르면 -> 딥러닝 모델 학습 시 동작이 잘 안됨
모든 특성의 범위 또는 분포는 같게 해주는 것이 바람직
표준화(Standardization)
정규화(Normalization)
Q. 표준화(Standardization) vs 정규화(Normalization)
A. 표준화는 평균을 0으로 만드는 것이고, 분산을 1로 만들지만 정규화는 최솟값을 0, 최댓값을 1로 만드는 것에 차이가 있음
Q. 손실함수의 최저값에 도달하기 위해, 학습률이 작을 수록 일반적으로 에폭수는 어떻게 달라지는지?
A. 에폭수는 많아짐(업데이트 횟수가 더 많이 필요할 것이기 때문)
시그모이드(Sigmoid)
계열, ReLU
계열Q. 두 계열간의 차이는?
A. 선형과 비선형
- 시그모이드 계열 : 결과값이 [0,1] or [-1,1] 사이 값으로 나옴
- 렐루 계열 : (예외 제외 시) 0을 중심으로 양수는 그대로, 음수는 0 또는 0과 가까운 수로 나옴
적절한 가중치 초기화 방법
Q. 적절한 가중치 초기값을 정해주는 것의 효과는?
A. 표현 가능한 신경망 수가 많아짐, 더 많은 가중치에 역전파 전달 가능, 비교적 많은 문제 표현 가능
Q. 옵티마이저 역할 및 목적
A. 손실함수 감소를 위함(실제 및 예측 값 간의 차이 감소) -> 가중치 업데이트 방식을 결정
from keras.datasets import imdb
import numpy as np
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
print(train_data[0])
print(train_labels[0])
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
17465344/17464789 [==============================] - 0s 0us/step
17473536/17464789 [==============================] - 0s 0us/step
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]
1
word_index = imdb.get_word_index()
word_index
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
1646592/1641221 [==============================] - 0s 0us/step
1654784/1641221 [==============================] - 0s 0us/step
{'fawn': 34701,
'tsukino': 52006,
'nunnery': 52007,
'sonja': 16816,
'vani': 63951,
'woods': 1408,
'spiders': 16115,
'hanging': 2345,
'woody': 2289,
'trawling': 52008,
"hold's": 52009,
'comically': 11307,
'localized': 40830,
'disobeying': 30568,
"'royale": 52010,
"harpo's": 40831,
'canet': 52011,
'aileen': 19313,
'acurately': 52012,
"diplomat's": 52013,
'rickman': 25242,
'arranged': 6746,
'rumbustious': 52014,
'familiarness': 52015,
"spider'": 52016,
'hahahah': 68804,
"wood'": 52017,
'transvestism': 40833,
"hangin'": 34702,
'bringing': 2338,
'seamier': 40834,
'wooded': 34703,
'bravora': 52018,
'grueling': 16817,
'wooden': 1636,
'wednesday': 16818,
"'prix": 52019,
'altagracia': 34704,
'circuitry': 52020,
'crotch': 11585,
'busybody': 57766,
"tart'n'tangy": 52021,
'burgade': 14129,
'thrace': 52023,
"tom's": 11038,
'snuggles': 52025,
'francesco': 29114,
'complainers': 52027,
'templarios': 52125,
'272': 40835,
'273': 52028,
'zaniacs': 52130,
'275': 34706,
'consenting': 27631,
'snuggled': 40836,
'inanimate': 15492,
'uality': 52030,
'bronte': 11926,
'errors': 4010,
'dialogs': 3230,
"yomada's": 52031,
"madman's": 34707,
'dialoge': 30585,
'usenet': 52033,
'videodrome': 40837,
"kid'": 26338,
'pawed': 52034,
"'girlfriend'": 30569,
"'pleasure": 52035,
"'reloaded'": 52036,
"kazakos'": 40839,
'rocque': 52037,
'mailings': 52038,
'brainwashed': 11927,
'mcanally': 16819,
"tom''": 52039,
'kurupt': 25243,
'affiliated': 21905,
'babaganoosh': 52040,
"noe's": 40840,
'quart': 40841,
'kids': 359,
'uplifting': 5034,
'controversy': 7093,
'kida': 21906,
'kidd': 23379,
"error'": 52041,
'neurologist': 52042,
'spotty': 18510,
'cobblers': 30570,
'projection': 9878,
'fastforwarding': 40842,
'sters': 52043,
"eggar's": 52044,
'etherything': 52045,
'gateshead': 40843,
'airball': 34708,
'unsinkable': 25244,
'stern': 7180,
"cervi's": 52046,
'dnd': 40844,
'dna': 11586,
'insecurity': 20598,
"'reboot'": 52047,
'trelkovsky': 11037,
'jaekel': 52048,
'sidebars': 52049,
"sforza's": 52050,
'distortions': 17633,
'mutinies': 52051,
'sermons': 30602,
'7ft': 40846,
'boobage': 52052,
"o'bannon's": 52053,
'populations': 23380,
'chulak': 52054,
'mesmerize': 27633,
'quinnell': 52055,
'yahoo': 10307,
'meteorologist': 52057,
'beswick': 42577,
'boorman': 15493,
'voicework': 40847,
"ster'": 52058,
'blustering': 22922,
'hj': 52059,
'intake': 27634,
'morally': 5621,
'jumbling': 40849,
'bowersock': 52060,
"'porky's'": 52061,
'gershon': 16821,
'ludicrosity': 40850,
'coprophilia': 52062,
'expressively': 40851,
"india's": 19500,
"post's": 34710,
'wana': 52063,
'wang': 5283,
'wand': 30571,
'wane': 25245,
'edgeways': 52321,
'titanium': 34711,
'pinta': 40852,
'want': 178,
'pinto': 30572,
'whoopdedoodles': 52065,
'tchaikovsky': 21908,
'travel': 2103,
"'victory'": 52066,
'copious': 11928,
'gouge': 22433,
"chapters'": 52067,
'barbra': 6702,
'uselessness': 30573,
"wan'": 52068,
'assimilated': 27635,
'petiot': 16116,
'most\x85and': 52069,
'dinosaurs': 3930,
'wrong': 352,
'seda': 52070,
'stollen': 52071,
'sentencing': 34712,
'ouroboros': 40853,
'assimilates': 40854,
'colorfully': 40855,
'glenne': 27636,
'dongen': 52072,
'subplots': 4760,
'kiloton': 52073,
'chandon': 23381,
"effect'": 34713,
'snugly': 27637,
'kuei': 40856,
'welcomed': 9092,
'dishonor': 30071,
'concurrence': 52075,
'stoicism': 23382,
"guys'": 14896,
"beroemd'": 52077,
'butcher': 6703,
"melfi's": 40857,
'aargh': 30623,
'playhouse': 20599,
'wickedly': 11308,
'fit': 1180,
'labratory': 52078,
'lifeline': 40859,
'screaming': 1927,
'fix': 4287,
'cineliterate': 52079,
'fic': 52080,
'fia': 52081,
'fig': 34714,
'fmvs': 52082,
'fie': 52083,
'reentered': 52084,
'fin': 30574,
'doctresses': 52085,
'fil': 52086,
'zucker': 12606,
'ached': 31931,
'counsil': 52088,
'paterfamilias': 52089,
'songwriter': 13885,
'shivam': 34715,
'hurting': 9654,
'effects': 299,
'slauther': 52090,
"'flame'": 52091,
'sommerset': 52092,
'interwhined': 52093,
'whacking': 27638,
'bartok': 52094,
'barton': 8775,
'frewer': 21909,
"fi'": 52095,
'ingrid': 6192,
'stribor': 30575,
'approporiately': 52096,
'wobblyhand': 52097,
'tantalisingly': 52098,
'ankylosaurus': 52099,
'parasites': 17634,
'childen': 52100,
"jenkins'": 52101,
'metafiction': 52102,
'golem': 17635,
'indiscretion': 40860,
"reeves'": 23383,
"inamorata's": 57781,
'brittannica': 52104,
'adapt': 7916,
"russo's": 30576,
'guitarists': 48246,
'abbott': 10553,
'abbots': 40861,
'lanisha': 17649,
'magickal': 40863,
'mattter': 52105,
"'willy": 52106,
'pumpkins': 34716,
'stuntpeople': 52107,
'estimate': 30577,
'ugghhh': 40864,
'gameplay': 11309,
"wern't": 52108,
"n'sync": 40865,
'sickeningly': 16117,
'chiara': 40866,
'disturbed': 4011,
'portmanteau': 40867,
'ineffectively': 52109,
"duchonvey's": 82143,
"nasty'": 37519,
'purpose': 1285,
'lazers': 52112,
'lightened': 28105,
'kaliganj': 52113,
'popularism': 52114,
"damme's": 18511,
'stylistics': 30578,
'mindgaming': 52115,
'spoilerish': 46449,
"'corny'": 52117,
'boerner': 34718,
'olds': 6792,
'bakelite': 52118,
'renovated': 27639,
'forrester': 27640,
"lumiere's": 52119,
'gaskets': 52024,
'needed': 884,
'smight': 34719,
'master': 1297,
"edie's": 25905,
'seeber': 40868,
'hiya': 52120,
'fuzziness': 52121,
'genesis': 14897,
'rewards': 12607,
'enthrall': 30579,
"'about": 40869,
"recollection's": 52122,
'mutilated': 11039,
'fatherlands': 52123,
"fischer's": 52124,
'positively': 5399,
'270': 34705,
'ahmed': 34720,
'zatoichi': 9836,
'bannister': 13886,
'anniversaries': 52127,
"helm's": 30580,
"'work'": 52128,
'exclaimed': 34721,
"'unfunny'": 52129,
'274': 52029,
'feeling': 544,
"wanda's": 52131,
'dolan': 33266,
'278': 52133,
'peacoat': 52134,
'brawny': 40870,
'mishra': 40871,
'worlders': 40872,
'protags': 52135,
'skullcap': 52136,
'dastagir': 57596,
'affairs': 5622,
'wholesome': 7799,
'hymen': 52137,
'paramedics': 25246,
'unpersons': 52138,
'heavyarms': 52139,
'affaire': 52140,
'coulisses': 52141,
'hymer': 40873,
'kremlin': 52142,
'shipments': 30581,
'pixilated': 52143,
"'00s": 30582,
'diminishing': 18512,
'cinematic': 1357,
'resonates': 14898,
'simplify': 40874,
"nature'": 40875,
'temptresses': 40876,
'reverence': 16822,
'resonated': 19502,
'dailey': 34722,
'2\x85': 52144,
'treize': 27641,
'majo': 52145,
'kiya': 21910,
'woolnough': 52146,
'thanatos': 39797,
'sandoval': 35731,
'dorama': 40879,
"o'shaughnessy": 52147,
'tech': 4988,
'fugitives': 32018,
'teck': 30583,
"'e'": 76125,
'doesn’t': 40881,
'purged': 52149,
'saying': 657,
"martians'": 41095,
'norliss': 23418,
'dickey': 27642,
'dicker': 52152,
"'sependipity": 52153,
'padded': 8422,
'ordell': 57792,
"sturges'": 40882,
'independentcritics': 52154,
'tempted': 5745,
"atkinson's": 34724,
'hounded': 25247,
'apace': 52155,
'clicked': 15494,
"'humor'": 30584,
"martino's": 17177,
"'supporting": 52156,
'warmongering': 52032,
"zemeckis's": 34725,
'lube': 21911,
'shocky': 52157,
'plate': 7476,
'plata': 40883,
'sturgess': 40884,
"nerds'": 40885,
'plato': 20600,
'plath': 34726,
'platt': 40886,
'mcnab': 52159,
'clumsiness': 27643,
'altogether': 3899,
'massacring': 42584,
'bicenntinial': 52160,
'skaal': 40887,
'droning': 14360,
'lds': 8776,
'jaguar': 21912,
"cale's": 34727,
'nicely': 1777,
'mummy': 4588,
"lot's": 18513,
'patch': 10086,
'kerkhof': 50202,
"leader's": 52161,
"'movie": 27644,
'uncomfirmed': 52162,
'heirloom': 40888,
'wrangle': 47360,
'emotion\x85': 52163,
"'stargate'": 52164,
'pinoy': 40889,
'conchatta': 40890,
'broeke': 41128,
'advisedly': 40891,
"barker's": 17636,
'descours': 52166,
'lots': 772,
'lotr': 9259,
'irs': 9879,
'lott': 52167,
'xvi': 40892,
'irk': 34728,
'irl': 52168,
'ira': 6887,
'belzer': 21913,
'irc': 52169,
'ire': 27645,
'requisites': 40893,
'discipline': 7693,
'lyoko': 52961,
'extend': 11310,
'nature': 873,
"'dickie'": 52170,
'optimist': 40894,
'lapping': 30586,
'superficial': 3900,
'vestment': 52171,
'extent': 2823,
'tendons': 52172,
"heller's": 52173,
'quagmires': 52174,
'miyako': 52175,
'moocow': 20601,
"coles'": 52176,
'lookit': 40895,
'ravenously': 52177,
'levitating': 40896,
'perfunctorily': 52178,
'lookin': 30587,
"lot'": 40898,
'lookie': 52179,
'fearlessly': 34870,
'libyan': 52181,
'fondles': 40899,
'gopher': 35714,
'wearying': 40901,
"nz's": 52182,
'minuses': 27646,
'puposelessly': 52183,
'shandling': 52184,
'decapitates': 31268,
'humming': 11929,
"'nother": 40902,
'smackdown': 21914,
'underdone': 30588,
'frf': 40903,
'triviality': 52185,
'fro': 25248,
'bothers': 8777,
"'kensington": 52186,
'much': 73,
'muco': 34730,
'wiseguy': 22615,
"richie's": 27648,
'tonino': 40904,
'unleavened': 52187,
'fry': 11587,
"'tv'": 40905,
'toning': 40906,
'obese': 14361,
'sensationalized': 30589,
'spiv': 40907,
'spit': 6259,
'arkin': 7364,
'charleton': 21915,
'jeon': 16823,
'boardroom': 21916,
'doubts': 4989,
'spin': 3084,
'hepo': 53083,
'wildcat': 27649,
'venoms': 10584,
'misconstrues': 52191,
'mesmerising': 18514,
'misconstrued': 40908,
'rescinds': 52192,
'prostrate': 52193,
'majid': 40909,
'climbed': 16479,
'canoeing': 34731,
'majin': 52195,
'animie': 57804,
'sylke': 40910,
'conditioned': 14899,
'waddell': 40911,
'3\x85': 52196,
'hyperdrive': 41188,
'conditioner': 34732,
'bricklayer': 53153,
'hong': 2576,
'memoriam': 52198,
'inventively': 30592,
"levant's": 25249,
'portobello': 20638,
'remand': 52200,
'mummified': 19504,
'honk': 27650,
'spews': 19505,
'visitations': 40912,
'mummifies': 52201,
'cavanaugh': 25250,
'zeon': 23385,
"jungle's": 40913,
'viertel': 34733,
'frenchmen': 27651,
'torpedoes': 52202,
'schlessinger': 52203,
'torpedoed': 34734,
'blister': 69876,
'cinefest': 52204,
'furlough': 34735,
'mainsequence': 52205,
'mentors': 40914,
'academic': 9094,
'stillness': 20602,
'academia': 40915,
'lonelier': 52206,
'nibby': 52207,
"losers'": 52208,
'cineastes': 40916,
'corporate': 4449,
'massaging': 40917,
'bellow': 30593,
'absurdities': 19506,
'expetations': 53241,
'nyfiken': 40918,
'mehras': 75638,
'lasse': 52209,
'visability': 52210,
'militarily': 33946,
"elder'": 52211,
'gainsbourg': 19023,
'hah': 20603,
'hai': 13420,
'haj': 34736,
'hak': 25251,
'hal': 4311,
'ham': 4892,
'duffer': 53259,
'haa': 52213,
'had': 66,
'advancement': 11930,
'hag': 16825,
"hand'": 25252,
'hay': 13421,
'mcnamara': 20604,
"mozart's": 52214,
'duffel': 30731,
'haq': 30594,
'har': 13887,
'has': 44,
'hat': 2401,
'hav': 40919,
'haw': 30595,
'figtings': 52215,
'elders': 15495,
'underpanted': 52216,
'pninson': 52217,
'unequivocally': 27652,
"barbara's": 23673,
"bello'": 52219,
'indicative': 12997,
'yawnfest': 40920,
'hexploitation': 52220,
"loder's": 52221,
'sleuthing': 27653,
"justin's": 32622,
"'ball": 52222,
"'summer": 52223,
"'demons'": 34935,
"mormon's": 52225,
"laughton's": 34737,
'debell': 52226,
'shipyard': 39724,
'unabashedly': 30597,
'disks': 40401,
'crowd': 2290,
'crowe': 10087,
"vancouver's": 56434,
'mosques': 34738,
'crown': 6627,
'culpas': 52227,
'crows': 27654,
'surrell': 53344,
'flowless': 52229,
'sheirk': 52230,
"'three": 40923,
"peterson'": 52231,
'ooverall': 52232,
'perchance': 40924,
'bottom': 1321,
'chabert': 53363,
'sneha': 52233,
'inhuman': 13888,
'ichii': 52234,
'ursla': 52235,
'completly': 30598,
'moviedom': 40925,
'raddick': 52236,
'brundage': 51995,
'brigades': 40926,
'starring': 1181,
"'goal'": 52237,
'caskets': 52238,
'willcock': 52239,
"threesome's": 52240,
"mosque'": 52241,
"cover's": 52242,
'spaceships': 17637,
'anomalous': 40927,
'ptsd': 27655,
'shirdan': 52243,
'obscenity': 21962,
'lemmings': 30599,
'duccio': 30600,
"levene's": 52244,
"'gorby'": 52245,
"teenager's": 25255,
'marshall': 5340,
'honeymoon': 9095,
'shoots': 3231,
'despised': 12258,
'okabasho': 52246,
'fabric': 8289,
'cannavale': 18515,
'raped': 3537,
"tutt's": 52247,
'grasping': 17638,
'despises': 18516,
"thief's": 40928,
'rapes': 8926,
'raper': 52248,
"eyre'": 27656,
'walchek': 52249,
"elmo's": 23386,
'perfumes': 40929,
'spurting': 21918,
"exposition'\x85": 52250,
'denoting': 52251,
'thesaurus': 34740,
"shoot'": 40930,
'bonejack': 49759,
'simpsonian': 52253,
'hebetude': 30601,
"hallow's": 34741,
'desperation\x85': 52254,
'incinerator': 34742,
'congratulations': 10308,
'humbled': 52255,
"else's": 5924,
'trelkovski': 40845,
"rape'": 52256,
"'chapters'": 59386,
'1600s': 52257,
'martian': 7253,
'nicest': 25256,
'eyred': 52259,
'passenger': 9457,
'disgrace': 6041,
'moderne': 52260,
'barrymore': 5120,
'yankovich': 52261,
'moderns': 40931,
'studliest': 52262,
'bedsheet': 52263,
'decapitation': 14900,
'slurring': 52264,
"'nunsploitation'": 52265,
"'character'": 34743,
'cambodia': 9880,
'rebelious': 52266,
'pasadena': 27657,
'crowne': 40932,
"'bedchamber": 52267,
'conjectural': 52268,
'appologize': 52269,
'halfassing': 52270,
'paycheque': 57816,
'palms': 20606,
"'islands": 52271,
'hawked': 40933,
'palme': 21919,
'conservatively': 40934,
'larp': 64007,
'palma': 5558,
'smelling': 21920,
'aragorn': 12998,
'hawker': 52272,
'hawkes': 52273,
'explosions': 3975,
'loren': 8059,
"pyle's": 52274,
'shootout': 6704,
"mike's": 18517,
"driscoll's": 52275,
'cogsworth': 40935,
"britian's": 52276,
'childs': 34744,
"portrait's": 52277,
'chain': 3626,
'whoever': 2497,
'puttered': 52278,
'childe': 52279,
'maywether': 52280,
'chair': 3036,
"rance's": 52281,
'machu': 34745,
'ballet': 4517,
'grapples': 34746,
'summerize': 76152,
'freelance': 30603,
"andrea's": 52283,
'\x91very': 52284,
'coolidge': 45879,
'mache': 18518,
'balled': 52285,
'grappled': 40937,
'macha': 18519,
'underlining': 21921,
'macho': 5623,
'oversight': 19507,
'machi': 25257,
'verbally': 11311,
'tenacious': 21922,
'windshields': 40938,
'paychecks': 18557,
'jerk': 3396,
"good'": 11931,
'prancer': 34748,
'prances': 21923,
'olympus': 52286,
'lark': 21924,
'embark': 10785,
'gloomy': 7365,
'jehaan': 52287,
'turaqui': 52288,
"child'": 20607,
'locked': 2894,
'pranced': 52289,
'exact': 2588,
'unattuned': 52290,
'minute': 783,
'skewed': 16118,
'hodgins': 40940,
'skewer': 34749,
'think\x85': 52291,
'rosenstein': 38765,
'helmit': 52292,
'wrestlemanias': 34750,
'hindered': 16826,
"martha's": 30604,
'cheree': 52293,
"pluckin'": 52294,
'ogles': 40941,
'heavyweight': 11932,
'aada': 82190,
'chopping': 11312,
'strongboy': 61534,
'hegemonic': 41342,
'adorns': 40942,
'xxth': 41346,
'nobuhiro': 34751,
'capitães': 52298,
'kavogianni': 52299,
'antwerp': 13422,
'celebrated': 6538,
'roarke': 52300,
'baggins': 40943,
'cheeseburgers': 31270,
'matras': 52301,
"nineties'": 52302,
"'craig'": 52303,
'celebrates': 12999,
'unintentionally': 3383,
'drafted': 14362,
'climby': 52304,
'303': 52305,
'oldies': 18520,
'climbs': 9096,
'honour': 9655,
'plucking': 34752,
'305': 30074,
'address': 5514,
'menjou': 40944,
"'freak'": 42592,
'dwindling': 19508,
'benson': 9458,
'white’s': 52307,
'shamelessness': 40945,
'impacted': 21925,
'upatz': 52308,
'cusack': 3840,
"flavia's": 37567,
'effette': 52309,
'influx': 34753,
'boooooooo': 52310,
'dimitrova': 52311,
'houseman': 13423,
'bigas': 25259,
'boylen': 52312,
'phillipenes': 52313,
'fakery': 40946,
"grandpa's": 27658,
'darnell': 27659,
'undergone': 19509,
'handbags': 52315,
'perished': 21926,
'pooped': 37778,
'vigour': 27660,
'opposed': 3627,
'etude': 52316,
"caine's": 11799,
'doozers': 52317,
'photojournals': 34754,
'perishes': 52318,
'constrains': 34755,
'migenes': 40948,
'consoled': 30605,
'alastair': 16827,
'wvs': 52319,
'ooooooh': 52320,
'approving': 34756,
'consoles': 40949,
'disparagement': 52064,
'futureistic': 52322,
'rebounding': 52323,
"'date": 52324,
'gregoire': 52325,
'rutherford': 21927,
'americanised': 34757,
'novikov': 82196,
'following': 1042,
'munroe': 34758,
"morita'": 52326,
'christenssen': 52327,
'oatmeal': 23106,
'fossey': 25260,
'livered': 40950,
'listens': 13000,
"'marci": 76164,
"otis's": 52330,
'thanking': 23387,
'maude': 16019,
'extensions': 34759,
'ameteurish': 52332,
"commender's": 52333,
'agricultural': 27661,
'convincingly': 4518,
'fueled': 17639,
'mahattan': 54014,
"paris's": 40952,
'vulkan': 52336,
'stapes': 52337,
'odysessy': 52338,
'harmon': 12259,
'surfing': 4252,
'halloran': 23494,
'unbelieveably': 49580,
"'offed'": 52339,
'quadrant': 30607,
'inhabiting': 19510,
'nebbish': 34760,
'forebears': 40953,
'skirmish': 34761,
'ocassionally': 52340,
"'resist": 52341,
'impactful': 21928,
'spicier': 52342,
'touristy': 40954,
"'football'": 52343,
'webpage': 40955,
'exurbia': 52345,
'jucier': 52346,
'professors': 14901,
'structuring': 34762,
'jig': 30608,
'overlord': 40956,
'disconnect': 25261,
'sniffle': 82201,
'slimeball': 40957,
'jia': 40958,
'milked': 16828,
'banjoes': 40959,
'jim': 1237,
'workforces': 52348,
'jip': 52349,
'rotweiller': 52350,
'mundaneness': 34763,
"'ninja'": 52351,
"dead'": 11040,
"cipriani's": 40960,
'modestly': 20608,
"professor'": 52352,
'shacked': 40961,
'bashful': 34764,
'sorter': 23388,
'overpowering': 16120,
'workmanlike': 18521,
'henpecked': 27662,
'sorted': 18522,
"jōb's": 52354,
"'always": 52355,
"'baptists": 34765,
'dreamcatchers': 52356,
"'silence'": 52357,
'hickory': 21929,
'fun\x97yet': 52358,
'breakumentary': 52359,
'didn': 15496,
'didi': 52360,
'pealing': 52361,
'dispite': 40962,
"italy's": 25262,
'instability': 21930,
'quarter': 6539,
'quartet': 12608,
'padmé': 52362,
"'bleedmedry": 52363,
'pahalniuk': 52364,
'honduras': 52365,
'bursting': 10786,
"pablo's": 41465,
'irremediably': 52367,
'presages': 40963,
'bowlegged': 57832,
'dalip': 65183,
'entering': 6260,
'newsradio': 76172,
'presaged': 54150,
"giallo's": 27663,
'bouyant': 40964,
'amerterish': 52368,
'rajni': 18523,
'leeves': 30610,
'macauley': 34767,
'seriously': 612,
'sugercoma': 52369,
'grimstead': 52370,
"'fairy'": 52371,
'zenda': 30611,
"'twins'": 52372,
'realisation': 17640,
'highsmith': 27664,
'raunchy': 7817,
'incentives': 40965,
'flatson': 52374,
'snooker': 35097,
'crazies': 16829,
'crazier': 14902,
'grandma': 7094,
'napunsaktha': 52375,
'workmanship': 30612,
'reisner': 52376,
"sanford's": 61306,
'\x91doña': 52377,
'modest': 6108,
"everything's": 19153,
'hamer': 40966,
"couldn't'": 52379,
'quibble': 13001,
'socking': 52380,
'tingler': 21931,
'gutman': 52381,
'lachlan': 40967,
'tableaus': 52382,
'headbanger': 52383,
'spoken': 2847,
'cerebrally': 34768,
"'road": 23490,
'tableaux': 21932,
"proust's": 40968,
'periodical': 40969,
"shoveller's": 52385,
'tamara': 25263,
'affords': 17641,
'concert': 3249,
"yara's": 87955,
'someome': 52386,
'lingering': 8424,
"abraham's": 41511,
'beesley': 34769,
'cherbourg': 34770,
'kagan': 28624,
'snatch': 9097,
"miyazaki's": 9260,
'absorbs': 25264,
"koltai's": 40970,
'tingled': 64027,
'crossroads': 19511,
'rehab': 16121,
'falworth': 52389,
'sequals': 52390,
...}
index_word = dict([(value, key) for (key, value) in word_index.items()])
index_word
{34701: 'fawn',
52006: 'tsukino',
52007: 'nunnery',
16816: 'sonja',
63951: 'vani',
1408: 'woods',
16115: 'spiders',
2345: 'hanging',
2289: 'woody',
52008: 'trawling',
52009: "hold's",
11307: 'comically',
40830: 'localized',
30568: 'disobeying',
52010: "'royale",
40831: "harpo's",
52011: 'canet',
19313: 'aileen',
52012: 'acurately',
52013: "diplomat's",
25242: 'rickman',
6746: 'arranged',
52014: 'rumbustious',
52015: 'familiarness',
52016: "spider'",
68804: 'hahahah',
52017: "wood'",
40833: 'transvestism',
34702: "hangin'",
2338: 'bringing',
40834: 'seamier',
34703: 'wooded',
52018: 'bravora',
16817: 'grueling',
1636: 'wooden',
16818: 'wednesday',
52019: "'prix",
34704: 'altagracia',
52020: 'circuitry',
11585: 'crotch',
57766: 'busybody',
52021: "tart'n'tangy",
14129: 'burgade',
52023: 'thrace',
11038: "tom's",
52025: 'snuggles',
29114: 'francesco',
52027: 'complainers',
52125: 'templarios',
40835: '272',
52028: '273',
52130: 'zaniacs',
34706: '275',
27631: 'consenting',
40836: 'snuggled',
15492: 'inanimate',
52030: 'uality',
11926: 'bronte',
4010: 'errors',
3230: 'dialogs',
52031: "yomada's",
34707: "madman's",
30585: 'dialoge',
52033: 'usenet',
40837: 'videodrome',
26338: "kid'",
52034: 'pawed',
30569: "'girlfriend'",
52035: "'pleasure",
52036: "'reloaded'",
40839: "kazakos'",
52037: 'rocque',
52038: 'mailings',
11927: 'brainwashed',
16819: 'mcanally',
52039: "tom''",
25243: 'kurupt',
21905: 'affiliated',
52040: 'babaganoosh',
40840: "noe's",
40841: 'quart',
359: 'kids',
5034: 'uplifting',
7093: 'controversy',
21906: 'kida',
23379: 'kidd',
52041: "error'",
52042: 'neurologist',
18510: 'spotty',
30570: 'cobblers',
9878: 'projection',
40842: 'fastforwarding',
52043: 'sters',
52044: "eggar's",
52045: 'etherything',
40843: 'gateshead',
34708: 'airball',
25244: 'unsinkable',
7180: 'stern',
52046: "cervi's",
40844: 'dnd',
11586: 'dna',
20598: 'insecurity',
52047: "'reboot'",
11037: 'trelkovsky',
52048: 'jaekel',
52049: 'sidebars',
52050: "sforza's",
17633: 'distortions',
52051: 'mutinies',
30602: 'sermons',
40846: '7ft',
52052: 'boobage',
52053: "o'bannon's",
23380: 'populations',
52054: 'chulak',
27633: 'mesmerize',
52055: 'quinnell',
10307: 'yahoo',
52057: 'meteorologist',
42577: 'beswick',
15493: 'boorman',
40847: 'voicework',
52058: "ster'",
22922: 'blustering',
52059: 'hj',
27634: 'intake',
5621: 'morally',
40849: 'jumbling',
52060: 'bowersock',
52061: "'porky's'",
16821: 'gershon',
40850: 'ludicrosity',
52062: 'coprophilia',
40851: 'expressively',
19500: "india's",
34710: "post's",
52063: 'wana',
5283: 'wang',
30571: 'wand',
25245: 'wane',
52321: 'edgeways',
34711: 'titanium',
40852: 'pinta',
178: 'want',
30572: 'pinto',
52065: 'whoopdedoodles',
21908: 'tchaikovsky',
2103: 'travel',
52066: "'victory'",
11928: 'copious',
22433: 'gouge',
52067: "chapters'",
6702: 'barbra',
30573: 'uselessness',
52068: "wan'",
27635: 'assimilated',
16116: 'petiot',
52069: 'most\x85and',
3930: 'dinosaurs',
352: 'wrong',
52070: 'seda',
52071: 'stollen',
34712: 'sentencing',
40853: 'ouroboros',
40854: 'assimilates',
40855: 'colorfully',
27636: 'glenne',
52072: 'dongen',
4760: 'subplots',
52073: 'kiloton',
23381: 'chandon',
34713: "effect'",
27637: 'snugly',
40856: 'kuei',
9092: 'welcomed',
30071: 'dishonor',
52075: 'concurrence',
23382: 'stoicism',
14896: "guys'",
52077: "beroemd'",
6703: 'butcher',
40857: "melfi's",
30623: 'aargh',
20599: 'playhouse',
11308: 'wickedly',
1180: 'fit',
52078: 'labratory',
40859: 'lifeline',
1927: 'screaming',
4287: 'fix',
52079: 'cineliterate',
52080: 'fic',
52081: 'fia',
34714: 'fig',
52082: 'fmvs',
52083: 'fie',
52084: 'reentered',
30574: 'fin',
52085: 'doctresses',
52086: 'fil',
12606: 'zucker',
31931: 'ached',
52088: 'counsil',
52089: 'paterfamilias',
13885: 'songwriter',
34715: 'shivam',
9654: 'hurting',
299: 'effects',
52090: 'slauther',
52091: "'flame'",
52092: 'sommerset',
52093: 'interwhined',
27638: 'whacking',
52094: 'bartok',
8775: 'barton',
21909: 'frewer',
52095: "fi'",
6192: 'ingrid',
30575: 'stribor',
52096: 'approporiately',
52097: 'wobblyhand',
52098: 'tantalisingly',
52099: 'ankylosaurus',
17634: 'parasites',
52100: 'childen',
52101: "jenkins'",
52102: 'metafiction',
17635: 'golem',
40860: 'indiscretion',
23383: "reeves'",
57781: "inamorata's",
52104: 'brittannica',
7916: 'adapt',
30576: "russo's",
48246: 'guitarists',
10553: 'abbott',
40861: 'abbots',
17649: 'lanisha',
40863: 'magickal',
52105: 'mattter',
52106: "'willy",
34716: 'pumpkins',
52107: 'stuntpeople',
30577: 'estimate',
40864: 'ugghhh',
11309: 'gameplay',
52108: "wern't",
40865: "n'sync",
16117: 'sickeningly',
40866: 'chiara',
4011: 'disturbed',
40867: 'portmanteau',
52109: 'ineffectively',
82143: "duchonvey's",
37519: "nasty'",
1285: 'purpose',
52112: 'lazers',
28105: 'lightened',
52113: 'kaliganj',
52114: 'popularism',
18511: "damme's",
30578: 'stylistics',
52115: 'mindgaming',
46449: 'spoilerish',
52117: "'corny'",
34718: 'boerner',
6792: 'olds',
52118: 'bakelite',
27639: 'renovated',
27640: 'forrester',
52119: "lumiere's",
52024: 'gaskets',
884: 'needed',
34719: 'smight',
1297: 'master',
25905: "edie's",
40868: 'seeber',
52120: 'hiya',
52121: 'fuzziness',
14897: 'genesis',
12607: 'rewards',
30579: 'enthrall',
40869: "'about",
52122: "recollection's",
11039: 'mutilated',
52123: 'fatherlands',
52124: "fischer's",
5399: 'positively',
34705: '270',
34720: 'ahmed',
9836: 'zatoichi',
13886: 'bannister',
52127: 'anniversaries',
30580: "helm's",
52128: "'work'",
34721: 'exclaimed',
52129: "'unfunny'",
52029: '274',
544: 'feeling',
52131: "wanda's",
33266: 'dolan',
52133: '278',
52134: 'peacoat',
40870: 'brawny',
40871: 'mishra',
40872: 'worlders',
52135: 'protags',
52136: 'skullcap',
57596: 'dastagir',
5622: 'affairs',
7799: 'wholesome',
52137: 'hymen',
25246: 'paramedics',
52138: 'unpersons',
52139: 'heavyarms',
52140: 'affaire',
52141: 'coulisses',
40873: 'hymer',
52142: 'kremlin',
30581: 'shipments',
52143: 'pixilated',
30582: "'00s",
18512: 'diminishing',
1357: 'cinematic',
14898: 'resonates',
40874: 'simplify',
40875: "nature'",
40876: 'temptresses',
16822: 'reverence',
19502: 'resonated',
34722: 'dailey',
52144: '2\x85',
27641: 'treize',
52145: 'majo',
21910: 'kiya',
52146: 'woolnough',
39797: 'thanatos',
35731: 'sandoval',
40879: 'dorama',
52147: "o'shaughnessy",
4988: 'tech',
32018: 'fugitives',
30583: 'teck',
76125: "'e'",
40881: 'doesn’t',
52149: 'purged',
657: 'saying',
41095: "martians'",
23418: 'norliss',
27642: 'dickey',
52152: 'dicker',
52153: "'sependipity",
8422: 'padded',
57792: 'ordell',
40882: "sturges'",
52154: 'independentcritics',
5745: 'tempted',
34724: "atkinson's",
25247: 'hounded',
52155: 'apace',
15494: 'clicked',
30584: "'humor'",
17177: "martino's",
52156: "'supporting",
52032: 'warmongering',
34725: "zemeckis's",
21911: 'lube',
52157: 'shocky',
7476: 'plate',
40883: 'plata',
40884: 'sturgess',
40885: "nerds'",
20600: 'plato',
34726: 'plath',
40886: 'platt',
52159: 'mcnab',
27643: 'clumsiness',
3899: 'altogether',
42584: 'massacring',
52160: 'bicenntinial',
40887: 'skaal',
14360: 'droning',
8776: 'lds',
21912: 'jaguar',
34727: "cale's",
1777: 'nicely',
4588: 'mummy',
18513: "lot's",
10086: 'patch',
50202: 'kerkhof',
52161: "leader's",
27644: "'movie",
52162: 'uncomfirmed',
40888: 'heirloom',
47360: 'wrangle',
52163: 'emotion\x85',
52164: "'stargate'",
40889: 'pinoy',
40890: 'conchatta',
41128: 'broeke',
40891: 'advisedly',
17636: "barker's",
52166: 'descours',
772: 'lots',
9259: 'lotr',
9879: 'irs',
52167: 'lott',
40892: 'xvi',
34728: 'irk',
52168: 'irl',
6887: 'ira',
21913: 'belzer',
52169: 'irc',
27645: 'ire',
40893: 'requisites',
7693: 'discipline',
52961: 'lyoko',
11310: 'extend',
873: 'nature',
52170: "'dickie'",
40894: 'optimist',
30586: 'lapping',
3900: 'superficial',
52171: 'vestment',
2823: 'extent',
52172: 'tendons',
52173: "heller's",
52174: 'quagmires',
52175: 'miyako',
20601: 'moocow',
52176: "coles'",
40895: 'lookit',
52177: 'ravenously',
40896: 'levitating',
52178: 'perfunctorily',
30587: 'lookin',
40898: "lot'",
52179: 'lookie',
34870: 'fearlessly',
52181: 'libyan',
40899: 'fondles',
35714: 'gopher',
40901: 'wearying',
52182: "nz's",
27646: 'minuses',
52183: 'puposelessly',
52184: 'shandling',
31268: 'decapitates',
11929: 'humming',
40902: "'nother",
21914: 'smackdown',
30588: 'underdone',
40903: 'frf',
52185: 'triviality',
25248: 'fro',
8777: 'bothers',
52186: "'kensington",
73: 'much',
34730: 'muco',
22615: 'wiseguy',
27648: "richie's",
40904: 'tonino',
52187: 'unleavened',
11587: 'fry',
40905: "'tv'",
40906: 'toning',
14361: 'obese',
30589: 'sensationalized',
40907: 'spiv',
6259: 'spit',
7364: 'arkin',
21915: 'charleton',
16823: 'jeon',
21916: 'boardroom',
4989: 'doubts',
3084: 'spin',
53083: 'hepo',
27649: 'wildcat',
10584: 'venoms',
52191: 'misconstrues',
18514: 'mesmerising',
40908: 'misconstrued',
52192: 'rescinds',
52193: 'prostrate',
40909: 'majid',
16479: 'climbed',
34731: 'canoeing',
52195: 'majin',
57804: 'animie',
40910: 'sylke',
14899: 'conditioned',
40911: 'waddell',
52196: '3\x85',
41188: 'hyperdrive',
34732: 'conditioner',
53153: 'bricklayer',
2576: 'hong',
52198: 'memoriam',
30592: 'inventively',
25249: "levant's",
20638: 'portobello',
52200: 'remand',
19504: 'mummified',
27650: 'honk',
19505: 'spews',
40912: 'visitations',
52201: 'mummifies',
25250: 'cavanaugh',
23385: 'zeon',
40913: "jungle's",
34733: 'viertel',
27651: 'frenchmen',
52202: 'torpedoes',
52203: 'schlessinger',
34734: 'torpedoed',
69876: 'blister',
52204: 'cinefest',
34735: 'furlough',
52205: 'mainsequence',
40914: 'mentors',
9094: 'academic',
20602: 'stillness',
40915: 'academia',
52206: 'lonelier',
52207: 'nibby',
52208: "losers'",
40916: 'cineastes',
4449: 'corporate',
40917: 'massaging',
30593: 'bellow',
19506: 'absurdities',
53241: 'expetations',
40918: 'nyfiken',
75638: 'mehras',
52209: 'lasse',
52210: 'visability',
33946: 'militarily',
52211: "elder'",
19023: 'gainsbourg',
20603: 'hah',
13420: 'hai',
34736: 'haj',
25251: 'hak',
4311: 'hal',
4892: 'ham',
53259: 'duffer',
52213: 'haa',
66: 'had',
11930: 'advancement',
16825: 'hag',
25252: "hand'",
13421: 'hay',
20604: 'mcnamara',
52214: "mozart's",
30731: 'duffel',
30594: 'haq',
13887: 'har',
44: 'has',
2401: 'hat',
40919: 'hav',
30595: 'haw',
52215: 'figtings',
15495: 'elders',
52216: 'underpanted',
52217: 'pninson',
27652: 'unequivocally',
23673: "barbara's",
52219: "bello'",
12997: 'indicative',
40920: 'yawnfest',
52220: 'hexploitation',
52221: "loder's",
27653: 'sleuthing',
32622: "justin's",
52222: "'ball",
52223: "'summer",
34935: "'demons'",
52225: "mormon's",
34737: "laughton's",
52226: 'debell',
39724: 'shipyard',
30597: 'unabashedly',
40401: 'disks',
2290: 'crowd',
10087: 'crowe',
56434: "vancouver's",
34738: 'mosques',
6627: 'crown',
52227: 'culpas',
27654: 'crows',
53344: 'surrell',
52229: 'flowless',
52230: 'sheirk',
40923: "'three",
52231: "peterson'",
52232: 'ooverall',
40924: 'perchance',
1321: 'bottom',
53363: 'chabert',
52233: 'sneha',
13888: 'inhuman',
52234: 'ichii',
52235: 'ursla',
30598: 'completly',
40925: 'moviedom',
52236: 'raddick',
51995: 'brundage',
40926: 'brigades',
1181: 'starring',
52237: "'goal'",
52238: 'caskets',
52239: 'willcock',
52240: "threesome's",
52241: "mosque'",
52242: "cover's",
17637: 'spaceships',
40927: 'anomalous',
27655: 'ptsd',
52243: 'shirdan',
21962: 'obscenity',
30599: 'lemmings',
30600: 'duccio',
52244: "levene's",
52245: "'gorby'",
25255: "teenager's",
5340: 'marshall',
9095: 'honeymoon',
3231: 'shoots',
12258: 'despised',
52246: 'okabasho',
8289: 'fabric',
18515: 'cannavale',
3537: 'raped',
52247: "tutt's",
17638: 'grasping',
18516: 'despises',
40928: "thief's",
8926: 'rapes',
52248: 'raper',
27656: "eyre'",
52249: 'walchek',
23386: "elmo's",
40929: 'perfumes',
21918: 'spurting',
52250: "exposition'\x85",
52251: 'denoting',
34740: 'thesaurus',
40930: "shoot'",
49759: 'bonejack',
52253: 'simpsonian',
30601: 'hebetude',
34741: "hallow's",
52254: 'desperation\x85',
34742: 'incinerator',
10308: 'congratulations',
52255: 'humbled',
5924: "else's",
40845: 'trelkovski',
52256: "rape'",
59386: "'chapters'",
52257: '1600s',
7253: 'martian',
25256: 'nicest',
52259: 'eyred',
9457: 'passenger',
6041: 'disgrace',
52260: 'moderne',
5120: 'barrymore',
52261: 'yankovich',
40931: 'moderns',
52262: 'studliest',
52263: 'bedsheet',
14900: 'decapitation',
52264: 'slurring',
52265: "'nunsploitation'",
34743: "'character'",
9880: 'cambodia',
52266: 'rebelious',
27657: 'pasadena',
40932: 'crowne',
52267: "'bedchamber",
52268: 'conjectural',
52269: 'appologize',
52270: 'halfassing',
57816: 'paycheque',
20606: 'palms',
52271: "'islands",
40933: 'hawked',
21919: 'palme',
40934: 'conservatively',
64007: 'larp',
5558: 'palma',
21920: 'smelling',
12998: 'aragorn',
52272: 'hawker',
52273: 'hawkes',
3975: 'explosions',
8059: 'loren',
52274: "pyle's",
6704: 'shootout',
18517: "mike's",
52275: "driscoll's",
40935: 'cogsworth',
52276: "britian's",
34744: 'childs',
52277: "portrait's",
3626: 'chain',
2497: 'whoever',
52278: 'puttered',
52279: 'childe',
52280: 'maywether',
3036: 'chair',
52281: "rance's",
34745: 'machu',
4517: 'ballet',
34746: 'grapples',
76152: 'summerize',
30603: 'freelance',
52283: "andrea's",
52284: '\x91very',
45879: 'coolidge',
18518: 'mache',
52285: 'balled',
40937: 'grappled',
18519: 'macha',
21921: 'underlining',
5623: 'macho',
19507: 'oversight',
25257: 'machi',
11311: 'verbally',
21922: 'tenacious',
40938: 'windshields',
18557: 'paychecks',
3396: 'jerk',
11931: "good'",
34748: 'prancer',
21923: 'prances',
52286: 'olympus',
21924: 'lark',
10785: 'embark',
7365: 'gloomy',
52287: 'jehaan',
52288: 'turaqui',
20607: "child'",
2894: 'locked',
52289: 'pranced',
2588: 'exact',
52290: 'unattuned',
783: 'minute',
16118: 'skewed',
40940: 'hodgins',
34749: 'skewer',
52291: 'think\x85',
38765: 'rosenstein',
52292: 'helmit',
34750: 'wrestlemanias',
16826: 'hindered',
30604: "martha's",
52293: 'cheree',
52294: "pluckin'",
40941: 'ogles',
11932: 'heavyweight',
82190: 'aada',
11312: 'chopping',
61534: 'strongboy',
41342: 'hegemonic',
40942: 'adorns',
41346: 'xxth',
34751: 'nobuhiro',
52298: 'capitães',
52299: 'kavogianni',
13422: 'antwerp',
6538: 'celebrated',
52300: 'roarke',
40943: 'baggins',
31270: 'cheeseburgers',
52301: 'matras',
52302: "nineties'",
52303: "'craig'",
12999: 'celebrates',
3383: 'unintentionally',
14362: 'drafted',
52304: 'climby',
52305: '303',
18520: 'oldies',
9096: 'climbs',
9655: 'honour',
34752: 'plucking',
30074: '305',
5514: 'address',
40944: 'menjou',
42592: "'freak'",
19508: 'dwindling',
9458: 'benson',
52307: 'white’s',
40945: 'shamelessness',
21925: 'impacted',
52308: 'upatz',
3840: 'cusack',
37567: "flavia's",
52309: 'effette',
34753: 'influx',
52310: 'boooooooo',
52311: 'dimitrova',
13423: 'houseman',
25259: 'bigas',
52312: 'boylen',
52313: 'phillipenes',
40946: 'fakery',
27658: "grandpa's",
27659: 'darnell',
19509: 'undergone',
52315: 'handbags',
21926: 'perished',
37778: 'pooped',
27660: 'vigour',
3627: 'opposed',
52316: 'etude',
11799: "caine's",
52317: 'doozers',
34754: 'photojournals',
52318: 'perishes',
34755: 'constrains',
40948: 'migenes',
30605: 'consoled',
16827: 'alastair',
52319: 'wvs',
52320: 'ooooooh',
34756: 'approving',
40949: 'consoles',
52064: 'disparagement',
52322: 'futureistic',
52323: 'rebounding',
52324: "'date",
52325: 'gregoire',
21927: 'rutherford',
34757: 'americanised',
82196: 'novikov',
1042: 'following',
34758: 'munroe',
52326: "morita'",
52327: 'christenssen',
23106: 'oatmeal',
25260: 'fossey',
40950: 'livered',
13000: 'listens',
76164: "'marci",
52330: "otis's",
23387: 'thanking',
16019: 'maude',
34759: 'extensions',
52332: 'ameteurish',
52333: "commender's",
27661: 'agricultural',
4518: 'convincingly',
17639: 'fueled',
54014: 'mahattan',
40952: "paris's",
52336: 'vulkan',
52337: 'stapes',
52338: 'odysessy',
12259: 'harmon',
4252: 'surfing',
23494: 'halloran',
49580: 'unbelieveably',
52339: "'offed'",
30607: 'quadrant',
19510: 'inhabiting',
34760: 'nebbish',
40953: 'forebears',
34761: 'skirmish',
52340: 'ocassionally',
52341: "'resist",
21928: 'impactful',
52342: 'spicier',
40954: 'touristy',
52343: "'football'",
40955: 'webpage',
52345: 'exurbia',
52346: 'jucier',
14901: 'professors',
34762: 'structuring',
30608: 'jig',
40956: 'overlord',
25261: 'disconnect',
82201: 'sniffle',
40957: 'slimeball',
40958: 'jia',
16828: 'milked',
40959: 'banjoes',
1237: 'jim',
52348: 'workforces',
52349: 'jip',
52350: 'rotweiller',
34763: 'mundaneness',
52351: "'ninja'",
11040: "dead'",
40960: "cipriani's",
20608: 'modestly',
52352: "professor'",
40961: 'shacked',
34764: 'bashful',
23388: 'sorter',
16120: 'overpowering',
18521: 'workmanlike',
27662: 'henpecked',
18522: 'sorted',
52354: "jōb's",
52355: "'always",
34765: "'baptists",
52356: 'dreamcatchers',
52357: "'silence'",
21929: 'hickory',
52358: 'fun\x97yet',
52359: 'breakumentary',
15496: 'didn',
52360: 'didi',
52361: 'pealing',
40962: 'dispite',
25262: "italy's",
21930: 'instability',
6539: 'quarter',
12608: 'quartet',
52362: 'padmé',
52363: "'bleedmedry",
52364: 'pahalniuk',
52365: 'honduras',
10786: 'bursting',
41465: "pablo's",
52367: 'irremediably',
40963: 'presages',
57832: 'bowlegged',
65183: 'dalip',
6260: 'entering',
76172: 'newsradio',
54150: 'presaged',
27663: "giallo's",
40964: 'bouyant',
52368: 'amerterish',
18523: 'rajni',
30610: 'leeves',
34767: 'macauley',
612: 'seriously',
52369: 'sugercoma',
52370: 'grimstead',
52371: "'fairy'",
30611: 'zenda',
52372: "'twins'",
17640: 'realisation',
27664: 'highsmith',
7817: 'raunchy',
40965: 'incentives',
52374: 'flatson',
35097: 'snooker',
16829: 'crazies',
14902: 'crazier',
7094: 'grandma',
52375: 'napunsaktha',
30612: 'workmanship',
52376: 'reisner',
61306: "sanford's",
52377: '\x91doña',
6108: 'modest',
19153: "everything's",
40966: 'hamer',
52379: "couldn't'",
13001: 'quibble',
52380: 'socking',
21931: 'tingler',
52381: 'gutman',
40967: 'lachlan',
52382: 'tableaus',
52383: 'headbanger',
2847: 'spoken',
34768: 'cerebrally',
23490: "'road",
21932: 'tableaux',
40968: "proust's",
40969: 'periodical',
52385: "shoveller's",
25263: 'tamara',
17641: 'affords',
3249: 'concert',
87955: "yara's",
52386: 'someome',
8424: 'lingering',
41511: "abraham's",
34769: 'beesley',
34770: 'cherbourg',
28624: 'kagan',
9097: 'snatch',
9260: "miyazaki's",
25264: 'absorbs',
40970: "koltai's",
64027: 'tingled',
19511: 'crossroads',
16121: 'rehab',
52389: 'falworth',
52390: 'sequals',
...}
# 1순위
index_word[1]
# 2순위
index_word[2]
...
Q. 25번째 단어를 키로 지정, word_index에 어떤 Value가 있는지 확인
word_25th = index_word[25] print(word_25th)
review = ' '.join([str(i) for i in train_data[0]])
review
'1 14 22 16 43 530 973 1622 1385 65 458 4468 66 3941 4 173 36 256 5 25 100 43 838 112 50 670 2 9 35 480 284 5 150 4 172 112 167 2 336 385 39 4 172 4536 1111 17 546 38 13 447 4 192 50 16 6 147 2025 19 14 22 4 1920 4613 469 4 22 71 87 12 16 43 530 38 76 15 13 1247 4 22 17 515 17 12 16 626 18 2 5 62 386 12 8 316 8 106 5 4 2223 5244 16 480 66 3785 33 4 130 12 16 38 619 5 25 124 51 36 135 48 25 1415 33 6 22 12 215 28 77 52 5 14 407 16 82 2 8 4 107 117 5952 15 256 4 2 7 3766 5 723 36 71 43 530 476 26 400 317 46 7 4 2 1029 13 104 88 4 381 15 297 98 32 2071 56 26 141 6 194 7486 18 4 226 22 21 134 476 26 480 5 144 30 5535 18 51 36 28 224 92 25 104 4 226 65 16 38 1334 88 12 16 283 5 16 4472 113 103 32 15 16 5345 19 178 32'
review = ' '.join([index_word.get(i-3, '?') for i in train_data[0]])
review
"? this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert ? is an amazing actor and now the same being director ? father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for ? and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also ? to the two little boy's that played the ? of norman and paul they were just brilliant children are often left out of the ? list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all"
def one_hot_encoding(data, dim=10000): # imdb 데이터의 num_words를 10000으로 설정해서 dim도 10000으로 맞춰줍니다.
results = np.zeros((len(data), dim))
for i, d in enumerate(data):
results[i, d] = 1.
return results
x_train = one_hot_encoding(train_data)
x_test = one_hot_encoding(test_data)
print(x_train[0])
print(train_labels[0])
print(test_labels[0])
y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')
print(y_train[0])
print(y_test[0])
import tensorflow as tf
from tensorflow.keras import models, layers
model = models.Sequential()
model.add(layers.Dense(16, activation='relu', input_shape=(10000, ), name='input'))
model.add(layers.Dense(16, activation='relu', name='hidden'))
model.add(layers.Dense(1, activation='sigmoid', name='output'))
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
history = model.fit(x_train, y_train,
epochs=20,
batch_size=512,
validation_data=(x_test, y_test))
Epoch 1/20
49/49 [==============================] - 8s 122ms/step - loss: 0.4458 - accuracy: 0.8264 - val_loss: 0.3471 - val_accuracy: 0.8704
Epoch 2/20
49/49 [==============================] - 1s 18ms/step - loss: 0.2519 - accuracy: 0.9116 - val_loss: 0.2916 - val_accuracy: 0.8846
Epoch 3/20
49/49 [==============================] - 1s 18ms/step - loss: 0.1970 - accuracy: 0.9283 - val_loss: 0.2831 - val_accuracy: 0.8876
Epoch 4/20
49/49 [==============================] - 1s 18ms/step - loss: 0.1643 - accuracy: 0.9426 - val_loss: 0.3033 - val_accuracy: 0.8816
Epoch 5/20
49/49 [==============================] - 1s 17ms/step - loss: 0.1457 - accuracy: 0.9491 - val_loss: 0.3111 - val_accuracy: 0.8794
Epoch 6/20
49/49 [==============================] - 1s 18ms/step - loss: 0.1257 - accuracy: 0.9572 - val_loss: 0.3398 - val_accuracy: 0.8741
Epoch 7/20
49/49 [==============================] - 1s 18ms/step - loss: 0.1108 - accuracy: 0.9628 - val_loss: 0.3692 - val_accuracy: 0.8705
Epoch 8/20
49/49 [==============================] - 1s 18ms/step - loss: 0.0986 - accuracy: 0.9674 - val_loss: 0.3972 - val_accuracy: 0.8651
Epoch 9/20
49/49 [==============================] - 1s 17ms/step - loss: 0.0895 - accuracy: 0.9710 - val_loss: 0.4187 - val_accuracy: 0.8634
Epoch 10/20
49/49 [==============================] - 1s 18ms/step - loss: 0.0784 - accuracy: 0.9735 - val_loss: 0.4810 - val_accuracy: 0.8520
Epoch 11/20
49/49 [==============================] - 1s 19ms/step - loss: 0.0693 - accuracy: 0.9782 - val_loss: 0.4636 - val_accuracy: 0.8620
Epoch 12/20
49/49 [==============================] - 1s 21ms/step - loss: 0.0612 - accuracy: 0.9821 - val_loss: 0.5248 - val_accuracy: 0.8516
Epoch 13/20
49/49 [==============================] - 1s 19ms/step - loss: 0.0531 - accuracy: 0.9850 - val_loss: 0.5239 - val_accuracy: 0.8580
Epoch 14/20
49/49 [==============================] - 1s 17ms/step - loss: 0.0482 - accuracy: 0.9860 - val_loss: 0.5636 - val_accuracy: 0.8576
Epoch 15/20
49/49 [==============================] - 1s 18ms/step - loss: 0.0430 - accuracy: 0.9872 - val_loss: 0.5901 - val_accuracy: 0.8561
Epoch 16/20
49/49 [==============================] - 1s 19ms/step - loss: 0.0347 - accuracy: 0.9909 - val_loss: 0.6601 - val_accuracy: 0.8500
Epoch 17/20
49/49 [==============================] - 1s 18ms/step - loss: 0.0321 - accuracy: 0.9910 - val_loss: 0.6656 - val_accuracy: 0.8506
Epoch 18/20
49/49 [==============================] - 1s 17ms/step - loss: 0.0263 - accuracy: 0.9931 - val_loss: 0.7486 - val_accuracy: 0.8489
Epoch 19/20
49/49 [==============================] - 1s 18ms/step - loss: 0.0233 - accuracy: 0.9936 - val_loss: 0.7412 - val_accuracy: 0.8511
Epoch 20/20
49/49 [==============================] - 1s 17ms/step - loss: 0.0196 - accuracy: 0.9952 - val_loss: 0.7769 - val_accuracy: 0.8479
import matplotlib.pyplot as plt
history_dict = history.history
loss = history_dict['loss']
val_loss = history_dict['val_loss']
epochs = range(1, len(loss) + 1)
fig = plt.figure(figsize=(12, 5))
ax1 = fig.add_subplot(1, 2, 1)
ax1.plot(epochs, loss, color='blue', label='train_loss')
ax1.plot(epochs, val_loss, color='red', label='val_loss')
ax1.set_title('Train and Validation Loss')
ax1.set_xlabel('Epochs')
ax1.set_ylabel('Loss')
ax1.grid()
ax1.legend()
accuracy = history_dict['accuracy']
val_accuracy = history_dict['val_accuracy']
ax2 = fig.add_subplot(1, 2, 2)
ax2.plot(epochs, accuracy, color='blue', label='train_accuracy')
ax2.plot(epochs, val_accuracy, color='red', label='val_accuracy')
ax2.set_title('Train and Validation Accuracy')
ax2.set_xlabel('Epochs')
ax2.set_ylabel('Accuracy')
ax2.grid()
ax2.legend()
plt.show()
분석
- val_loss 증가 추세
- val_accuracy 감소 추세
- OverFitting(과대적합)
A. 다양한 학습 데이터 수집&학습, 파라미터가 적은 모델의 선택, 학습 데이터 특성 수 줄이기 등 ➡️ 모델 단순화 작업을 시도해볼 것 같음