资源描述:
《Teaching AI about human knowledge》由会员上传分享,免费在线阅读,更多相关内容在学术论文-天天文库。
1、TeachingAIabouthumanknowledgeInesMontaniExplosionAIExplosionAIisadigitalstudiospecialisinginArtificialIntelligenceandNaturalLanguageProcessing.Open-sourcelibraryforindustrial-strengthNaturalLanguageProcessingspaCy’snext-generationMachineLearninglibraryfordeeplearningwithtextcomingso
2、on:pre-trained,customisablemodelsDataStoreforavarietyoflanguagesanddomainsMachinelearningisprogrammingbyexample.Examplesareyoursourcecode,trainingiscompilation.exampleslabelsinputpredictiontrainingdrawexamplesfromthesamedistributionastheruntimeinputsgoal:system’spredictiongivensome
3、inputmatcheslabelahumanwouldhaveassignedHowmachines“learn”Example:trainingasimplepart-of-speechtaggerwiththeperceptronalgorithm(teachthemodeltorecogniseverbs,nouns,etc.)deftrain_tagger(examples):examples=words,tags,contextsW=defaultdict(lambda:zeros(n_tags))theweightswe'lltrainfor(
4、word,prev,next),human_taginexamples:scores=W[word]+W[prev]+W[next]scoreeachtaggivenweights&contextguess=scores.argmax()getthebest-scoringtagifguess!=human_tag:iftheguesswaswrong,adjustweightsforfeatin(word,prev,next):W[feat][guess]-=1decreasescoreforbadtaginthiscontextW[feat][human
5、_tag]+=1increasescoreforgoodtaginthiscontextThebottleneckinAIisdata,notalgorithms.Algorithmsaregeneral,trainingdataisspecific.dataquality,dataquantityandaccuracyproblemsarestillthebiggestproblemsinAI(Source:TheStateofAIsurvey)youcanextractknowledgefromallkindsofsources,e.g.sentiment
6、fromemojionReddit�youusuallyneedatleastsomedataspecifictoyourproblemannotatedbyhumansWherehumanknowledgeinAIreallycomesfromMechanicalTurkhumanannotators~$5perhourboringtaskslowincentivesImages:AmazonMechanicalTurk,depressing.orgDon’texpectgreatdataifyou’reboringtheshitoutofunderpaid
7、people.Whyarewe“designingaround”this?“TakingaHIT:DesigningaroundRejection,Mistrust,Risk,andWorkers’ExperiencesinAmazonMechanicalTurk”(McInnisetal.,2016)datacollectionneedsthesametreatmentasallotherhuman-facingprocessesgoodUX+purpose+incentives=betterqualitySOLUTION#1UX-drivendataco
8、llectionwithactivelearning