From: anon Date: Mon, 6 Jun 2022 16:05:07 +0000 (+0200) Subject: X1 final waifu paper X-Git-Url: https://git.hentai-ai.org/?a=commitdiff_plain;h=refs%2Fheads%2Fmaster;p=papers%2FwAiFu.git%2F.git X1 final waifu paper --- diff --git a/wAiFu.aux b/wAiFu.aux index 9fa7d92..a260fb0 100644 --- a/wAiFu.aux +++ b/wAiFu.aux @@ -70,24 +70,23 @@ \@writefile{toc}{\contentsline {section}{\numberline {VI}Results}{4}{section.6}\protected@file@percent } \newlabel{sec:results}{{VI}{4}{Results}{section.6}{}} \@writefile{toc}{\contentsline {subsection}{\numberline {\mbox {VI-A}}Justifying Additional Transforms}{4}{subsection.6.1}\protected@file@percent } -\@writefile{toc}{\contentsline {subsection}{\numberline {\mbox {VI-B}}Error Rate of Thighs}{4}{subsection.6.2}\protected@file@percent } \@writefile{lof}{\contentsline {figure}{\numberline {8}{\ignorespaces Training image without batch transforms}}{5}{figure.8}\protected@file@percent } \newlabel{fig:wobt}{{8}{5}{Training image without batch transforms}{figure.8}{}} \@writefile{lof}{\contentsline {figure}{\numberline {9}{\ignorespaces Training image with batch transforms}}{5}{figure.9}\protected@file@percent } \newlabel{fig:wbt}{{9}{5}{Training image with batch transforms}{figure.9}{}} -\@writefile{toc}{\contentsline {section}{\numberline {VII}Discussion}{5}{section.7}\protected@file@percent } -\newlabel{sec:discussion}{{VII}{5}{Discussion}{section.7}{}} -\@writefile{toc}{\contentsline {subsection}{\numberline {\mbox {VII-A}}Limitations}{5}{subsection.7.1}\protected@file@percent } -\newlabel{sec:limitations}{{\mbox {VII-A}}{5}{Limitations}{subsection.7.1}{}} +\@writefile{lof}{\contentsline {figure}{\numberline {10}{\ignorespaces Comparing with and without batch transforms on error\_rate, train\_loss and valid\_loss}}{5}{figure.10}\protected@file@percent } +\newlabel{fig:btgraph}{{10}{5}{Comparing with and without batch transforms on error\_rate, train\_loss and valid\_loss}{figure.10}{}} +\@writefile{toc}{\contentsline {subsection}{\numberline {\mbox {VI-B}}Error Rate of Thighs}{5}{subsection.6.2}\protected@file@percent } \@writefile{lot}{\contentsline {table}{\numberline {I}{\ignorespaces User Training}}{5}{table.1}\protected@file@percent } \newlabel{tab:user-train}{{I}{5}{User Training}{table.1}{}} \@writefile{lot}{\contentsline {table}{\numberline {II}{\ignorespaces User Testing}}{5}{table.2}\protected@file@percent } \newlabel{tab:user-test}{{II}{5}{User Testing}{table.2}{}} -\@writefile{lof}{\contentsline {figure}{\numberline {10}{\ignorespaces Comparing with and without batch transforms on error\_rate, train\_loss and valid\_loss}}{5}{figure.10}\protected@file@percent } -\newlabel{fig:btgraph}{{10}{5}{Comparing with and without batch transforms on error\_rate, train\_loss and valid\_loss}{figure.10}{}} +\@writefile{toc}{\contentsline {section}{\numberline {VII}Discussion}{5}{section.7}\protected@file@percent } +\newlabel{sec:discussion}{{VII}{5}{Discussion}{section.7}{}} +\@writefile{toc}{\contentsline {subsection}{\numberline {\mbox {VII-A}}Limitations}{5}{subsection.7.1}\protected@file@percent } +\newlabel{sec:limitations}{{\mbox {VII-A}}{5}{Limitations}{subsection.7.1}{}} \@writefile{toc}{\contentsline {subsection}{\numberline {\mbox {VII-B}}Future Work}{5}{subsection.7.2}\protected@file@percent } \newlabel{sec:futurework}{{\mbox {VII-B}}{5}{Future Work}{subsection.7.2}{}} -\@writefile{toc}{\contentsline {section}{\numberline {VIII}Conclusion}{5}{section.8}\protected@file@percent } \bibdata{ref} \bibcite{tkinter}{1} \bibcite{zoom-advanced}{2} @@ -96,4 +95,5 @@ \bibcite{machinelearning}{5} \bibcite{thighdeology}{6} \bibstyle{plain} +\@writefile{toc}{\contentsline {section}{\numberline {VIII}Conclusion}{6}{section.8}\protected@file@percent } \@writefile{toc}{\contentsline {section}{References}{6}{section*.2}\protected@file@percent } diff --git a/wAiFu.log b/wAiFu.log index 0472013..f17b68e 100644 --- a/wAiFu.log +++ b/wAiFu.log @@ -1,4 +1,4 @@ -This is pdfTeX, Version 3.14159265-2.6-1.40.20 (TeX Live 2019/Debian) (preloaded format=pdflatex 2021.10.22) 5 JUN 2022 20:53 +This is pdfTeX, Version 3.14159265-2.6-1.40.20 (TeX Live 2019/Debian) (preloaded format=pdflatex 2021.10.22) 6 JUN 2022 18:03 entering extended mode restricted \write18 enabled. %&-line parsing enabled. @@ -468,10 +468,10 @@ File: umsb.fd 2013/01/14 v3.01 AMS symbols B pdfTeX warning: pdflatex (file ./img/ai_diagram.pdf): PDF inclusion: found PDF version <1.7>, but at most version <1.5> allowed - + File: img/ai_diagram.pdf Graphic file (type pdf) -Package pdftex.def Info: img/ai_diagram.pdf used on input line 54. +Package pdftex.def Info: img/ai_diagram.pdf used on input line 52. (pdftex.def) Requested size: 180.67455pt x 188.62422pt. [1{/var/lib/texmf/fonts/map/pdftex/updmap/pdftex.map} @@ -481,41 +481,41 @@ Package pdftex.def Info: img/ai_diagram.pdf used on input line 54. pdfTeX warning: pdflatex (file ./img/thighs_diagram.drawio.pdf): PDF inclusion: found PDF version <1.7>, but at most version <1.5> allowed - + File: img/thighs_diagram.drawio.pdf Graphic file (type pdf) -Package pdftex.def Info: img/thighs_diagram.drawio.pdf used on input line 122. +Package pdftex.def Info: img/thighs_diagram.drawio.pdf used on input line 117. (pdftex.def) Requested size: 258.0pt x 161.67961pt. -Overfull \hbox (6.0pt too wide) in paragraph at lines 122--123 +Overfull \hbox (6.0pt too wide) in paragraph at lines 117--118 [][] [] LaTeX Font Info: Trying to load font information for OT1+pcr on input line 1 -34. +29. (/usr/share/texlive/texmf-dist/tex/latex/psnfss/ot1pcr.fd File: ot1pcr.fd 2001/06/04 font definitions for OT1/pcr. ) - + File: img/data_sets.png Graphic file (type png) -Package pdftex.def Info: img/data_sets.png used on input line 137. +Package pdftex.def Info: img/data_sets.png used on input line 132. (pdftex.def) Requested size: 258.0pt x 61.5058pt. -Overfull \hbox (6.0pt too wide) in paragraph at lines 137--138 +Overfull \hbox (6.0pt too wide) in paragraph at lines 132--133 [][] [] File: img/crop1.png Graphic file (type png) -Package pdftex.def Info: img/crop1.png used on input line 155. +Package pdftex.def Info: img/crop1.png used on input line 149. (pdftex.def) Requested size: 154.80157pt x 164.05174pt. File: img/crop2.png Graphic file (type png) -Package pdftex.def Info: img/crop2.png used on input line 161. +Package pdftex.def Info: img/crop2.png used on input line 155. (pdftex.def) Requested size: 154.80157pt x 164.05174pt. LaTeX Warning: `h' float specifier changed to `ht'. @@ -526,39 +526,34 @@ LaTeX Warning: `h' float specifier changed to `ht'. File: img/tinder.png Graphic file (type png) -Package pdftex.def Info: img/tinder.png used on input line 171. +Package pdftex.def Info: img/tinder.png used on input line 165. (pdftex.def) Requested size: 154.80157pt x 234.61816pt. [3 <./img/thighs_diagram.drawio.pdf> <./img/data_sets.png (PNG copy)>] LaTeX Warning: `h' float specifier changed to `ht'. (./csv/test1.csv) -Underfull \vbox (badness 1173) has occurred while \output is active [] - - + File: img/no_batch_transform1.png Graphic file (type png) -Package pdftex.def Info: img/no_batch_transform1.png used on input line 200. +Package pdftex.def Info: img/no_batch_transform1.png used on input line 193. (pdftex.def) Requested size: 232.19843pt x 96.04327pt. - + [4 <./img/crop1.png> <./img/crop2.png> <./img/tinder.png>] + File: img/with_batch_transform2.png Graphic file (type png) -Package pdftex.def Info: img/with_batch_transform2.png used on input line 206. +Package pdftex.def Info: img/with_batch_transform2.png used on input line 199. (pdftex.def) Requested size: 232.19843pt x 96.04327pt. - + File: img/with_vs_without_batch_transforms.png Graphic file (type png) Package pdftex.def Info: img/with_vs_without_batch_transforms.png used on inpu -t line 212. +t line 205. (pdftex.def) Requested size: 242.52063pt x 232.26645pt. - -LaTeX Warning: `h' float specifier changed to `ht'. - -[4 <./img/crop1.png> <./img/crop2.png> <./img/tinder.png>] [5 <./img/no_batch_t -ransform1.png> <./img/with_batch_transform2.png> <./img/with_vs_without_batch_t -ransforms.png>] (./wAiFu.bbl) +[5 <./img/no_batch_transform1.png> <./img/with_batch_transform2.png> <./img/wit +h_vs_without_batch_transforms.png>] (./wAiFu.bbl) ** Conference Paper ** Before submitting the final camera ready copy, remember to: @@ -570,22 +565,22 @@ Before submitting the final camera ready copy, remember to: uses only Type 1 fonts and that every step in the generation process uses the appropriate paper size. -Package atveryend Info: Empty hook `BeforeClearDocument' on input line 271. +Package atveryend Info: Empty hook `BeforeClearDocument' on input line 263. [6 ] -Package atveryend Info: Empty hook `AfterLastShipout' on input line 271. +Package atveryend Info: Empty hook `AfterLastShipout' on input line 263. (./wAiFu.aux) -Package atveryend Info: Executing hook `AtVeryEndDocument' on input line 271. -Package atveryend Info: Executing hook `AtEndAfterFileList' on input line 271. +Package atveryend Info: Executing hook `AtVeryEndDocument' on input line 263. +Package atveryend Info: Executing hook `AtEndAfterFileList' on input line 263. Package rerunfilecheck Info: File `wAiFu.out' has not changed. (rerunfilecheck) Checksum: 607914959793BD1A383D08B0B432B5EB;1439. -Package atveryend Info: Empty hook `AtVeryVeryEnd' on input line 271. +Package atveryend Info: Empty hook `AtVeryVeryEnd' on input line 263. ) Here is how much of TeX's memory you used: 9483 strings out of 483183 140046 string characters out of 5966291 - 408176 words of memory out of 5000000 + 406176 words of memory out of 5000000 24269 multiletter control sequences out of 15000+600000 579147 words of font info for 116 fonts, out of 8000000 for 9000 14 hyphenation exceptions out of 8191 @@ -598,10 +593,10 @@ ype1/urw/courier/ucrr8a.pfb>< /usr/share/texlive/texmf-dist/fonts/type1/urw/times/utmr8a.pfb> -Output written on wAiFu.pdf (6 pages, 2106818 bytes). +Output written on wAiFu.pdf (6 pages, 2107818 bytes). PDF statistics: - 291 PDF objects out of 1000 (max. 8388607) - 246 compressed objects within 3 object streams + 292 PDF objects out of 1000 (max. 8388607) + 247 compressed objects within 3 object streams 64 named destinations out of 1000 (max. 500000) 238 words of extra memory for PDF output out of 10000 (max. 10000000) diff --git a/wAiFu.pdf b/wAiFu.pdf index d6c9055..3a53288 100644 Binary files a/wAiFu.pdf and b/wAiFu.pdf differ diff --git a/wAiFu.tex b/wAiFu.tex index 3caf141..b5bee48 100644 --- a/wAiFu.tex +++ b/wAiFu.tex @@ -24,7 +24,7 @@ \maketitle \begin{abstract} - For too many years have the world of Artificial Intelligence and the world of Hentai been separate ecosystems in which they do not realize the powerful potential of an alliance. Project Hentai AI aims to bring Artificial Intelligence into the sphere of Hentai, Ecchi and Lewds. In this paper, we propose a Witty Artificial Intelligence Framework Utilization (wAiFu). This framework is built for processing and labeling data, as well as training machine learning models to classify images of lewd anime/manga and hentai based on subjective user rating. As a proof of concept, this framework is applied to images of lewd anime thighs labeled using a boolean method. A dataset of 1000 images is collected, processed and labeled before being loaded into a fastai implementation of a Convolutional Neural Network designed for Computer Vision. The retraining of a resnet34 model for 20 epochs using labels from three different users resulted in an accuracy of over 70\%. Furthermore, a couple of limitations were identified, most significantly the small size of the dataset could cause the model to overfit. As mitigation, the data was transformed in batches. Future work in Project Hentai AI will focus extra on upscaling the data collection phase. + For too many years have the world of Artificial Intelligence (AI) and the world of Hentai been separate ecosystems. Project Hentai AI aims to bring AI into the sphere of Hentai, Ecchi and Lewds. In this paper, we propose a Witty Artificial Intelligence Framework Utilization (wAiFu). This framework is built for processing and labeling data, as well as training machine learning models to classify images of lewd anime/manga and hentai based on subjective user rating. As a proof of concept, this framework is applied to images of lewd anime thighs labeled using a boolean method. A dataset of 1000 images is collected, processed and labeled before being loaded into a fastai implementation of a Convolutional Neural Network (CNN) designed for Computer Vision. The retraining of a resnet34 model for 20 epochs using labels from three different users resulted in an accuracy of 70\%, 78\% and 97\%. Furthermore, a couple of limitations were identified, most importantly that the small size of the dataset could cause the model to overfit. As mitigation, the data was augmented using batch transforms in fastai. Future work in Project Hentai AI will focus extra on upscaling the data collection phase. \end{abstract} \begin{IEEEkeywords} @@ -32,9 +32,7 @@ deep learning, DL, machine learning, ML, artificial intelligence, AI, thighs, th \end{IEEEkeywords} \section{Introduction} \label{sec:intro} -It all began when a friend started reviewing anime thighs sent their way. The reviews were simply approved or disapproved, but the surprisingly low amount of approved images sparked the idea of a machine learning model capable of learning an individual's taste in anime thighs. By feeding a model images and their respective rating a model could be able to learn an individual's subjective taste. -% More here -The framework of wAiFu is not limited to lewd anime thighs, but can very easily be extended to other areas e.g., tits, ass, abs, middriffs and armpits. +It all began when a friend started reviewing anime thighs sent their way. The reviews were simply approved or disapproved, but the surprisingly low amount of approved images sparked the idea of a machine learning model capable of learning an individual's taste in anime thighs. By feeding a model images with rating labels, it could be able to learn an individual's subjective taste. The framework of wAiFu is not limited to lewd anime thighs, but can very easily be extended to other features e.g., tits, ass, abs, middriffs and armpits. The code of all tools in Project Hentai AI is open source and can be found at \url{https://git.hentai-ai.org}. \section{Background} \label{sec:background} @@ -59,8 +57,7 @@ While ML needs to perform the feature extraction manually from the input before Machine Learning and Deep Learning falls under the discipline of Artificial Intelligence in computer science, visually presented in Figure~\ref{fig:ai}. \subsection{Hentai and Thighdeology} \label{sec:hentai} -For the purpose of this study and future studies in Project Hentai AI, the data in the datasets are categorised in three definitions: \emph{Hentai}, \emph{Ecchi} and \emph{Lewd}. -In its simplest definition, Hentai is simply anime and manga pornography and can be seen as the most extreme out of the three. Ecchi on the other hand, when used as an adjective, translates to ``sexy'', ``dirty'' or ``naughty'', and has been used to describe anime and manga with \emph{sexual overtones} (playful sexuality or softcore). Lewd in these studies is defined as \emph{sexual undertones}. A detailed differentiation between the three categories is planned for a separate study. Project Hentai AI will include ecchi and lewds, even though the name of the project uses the term hentai. +For the purpose of this and future studies in Project Hentai AI, the data in the datasets are categorised in three definitions: \emph{Hentai}, \emph{Ecchi} and \emph{Lewd}. In its simplest definition, Hentai can be described as anime and manga pornography. Ecchi on the other hand, when used as an adjective, translates to ``sexy'', ``dirty'' or ``naughty'', and has been used to describe anime and manga with \emph{sexual overtones} (playful sexuality or softcore). Lewd in these studies is defined as \emph{sexual undertones}. A detailed differentiation between the three categories is planned for a separate study. Project Hentai AI will include ecchi and lewds, even though the name of the project uses the term hentai. Thighdeology is the worship of thick anime thighs which has its Mecca on the Thighdeology subreddit~\cite{thighdeology}. The top two rules on the subreddit are: (1) All images must be thigh-focused and (2) No Pictures of Sex (Nudity is allowed). The second rule is a clear demonstration of the distinction between hentai and ecchi described above. The dataset used for wAiFu is images of lewd anime thighs in accordance with these two rules. @@ -71,7 +68,7 @@ The epigraph which crowns the website says it all: \section{Method} \label{sec:method} -\emph{wAiFu} stands for Witty Artificial Intelligence Framework Utilization, and its goal is to standardize the process of creating a subjectively labeled dataset for machine learning. This means that a single set of images can be used as separate datasets depending on the subjective labeling. A system is set up for homogenizing the images (filename and file extensions), cropping the images to isolate the area of interest as much as possible and finally labeling the images using a separate file for mapping each filename to its subjective labeling. +\emph{wAiFu} stands for Witty Artificial Intelligence Framework Utilization, and its goal is to standardize the process of creating a subjectively labeled dataset for machine learning. This means that a single set of images can be used as separate datasets depending on the subjective labeling. A script homogenized the images (filename and file extensions). An application was developed for cropping the images to isolate the area of interest as much as possible. Finally an application was developed to label the images using a separate file for mapping each filename to its subjective labeling. \section{Design} \label{sec:design} The following section describes the design of wAiFu in its separate components in detail: the data collection, the data preparation, the data labeling and finally the machine learning API. @@ -88,9 +85,7 @@ The following section describes the design of wAiFu in its separate components i \end{itemize} ~\\\noindent After collection, the data was manually screened for (A) presence of thighs (B) image quality and (C) image \emph{cropability}. The presence of thighs simply implies that the image in question contains a section of the lower body of a humanoid character. The vast majority of the characters depicted in the images collected were of the feminine nature, although this was most likely due to the skewed ratio of feminine/masculine thighs from the sources themselves and not due to any discrimination during the manual collecting. This is included within future work in Section~\ref{sec:futurework}. - -Image quality refers to the resolution of the picture. When finding duplicates, the one with higher resolution was kept. Some images where included in the dataset even if the quality of the resolution was below average due to either its content or source. - +Image quality refers to the resolution of the picture. When finding duplicates, the one with higher resolution was kept. Some images where included in the dataset due to its contents, even if the quality of the resolution was below average. Image cropability refers to the composition of the picture. Since the focus of the first dataset in wAiFu is \emph{thighs}, it is preferred to isolate the thighs from other factors in the image which could influence the labeling, such as: faces, tits and other eye-catching details (some of the cropped images in the dataset does contain the ass region due to non-perfect but acceptable levels of cropability). \subsection{Data Preparation} \label{sec:dataprep} @@ -104,7 +99,7 @@ In order to get a uniform dataset the images collected were converted from JPG/J The naming convention was arbitrarily decided to be structured as \textbf{thighs-id.png} where \textbf{id} is an increasing nonce (number only used once) padded with four zeroes (e.g., \textbf{thighs-0001.png}). -The images were then cropped to contain as little as possible apart from the topic at hand (thighs). This was done with the intention of focusing both the manual labeling process as well as the machine learning training on the thighs. If the character on the image would have a certain hair color this could potentially influence the user when labeling the dataset, and later might be picked up during the learning and thus distorting the focus on the subject matter for this study. +The images were then cropped to contain as little as possible apart from the topic at hand (thighs). This was done with the intention of focusing both the manual labeling process as well as the machine learning training on the thighs. If the character on the image would have a certain unrelated feature this could potentially influence the user when labeling the dataset, and later might be picked up during the learning and thus distorting the focus on the subject matter for this study. After cropping the original non-cropped images are kept with their original name, while the newly cropped images get an appended notation of having undergone the procedure (e.g., \textbf{thighs-0001-crop.png}). The cropped images were placed in a separate directory from the original images. By keeping both datasets, this study provides the possibility of utilizing the non-cropped images for future work. @@ -116,7 +111,7 @@ The labeling of datasets in wAiFu is categorised in two different methods: \item Boolean labeling \item Scale labeling \end{itemize} -The \emph{Boolean labeling} consist of two disjunctive values (e.g., True/False, Yes/No, Approved/Disapproved, 1/0) which is the closest to the reviews previously gotten when brokering pictures of anime thighs manually. An image would be sent and an Approved/Disapproved would be received in return. A diagram example is seen in Figure~\ref{fig:protocol}. +The \emph{Boolean labeling} consist of two disjunctive values (e.g., True/False, Yes/No, Approved/Disapproved, 1/0) which is the closest to the responses gotten previously when brokering pictures of anime thighs manually. An image would be sent and an Approved/Disapproved would be received in return. A diagram example is seen in Figure~\ref{fig:protocol}. \begin{figure}[t!] \includegraphics[width=.5\textwidth]{img/thighs_diagram.drawio.pdf} @@ -124,14 +119,14 @@ The \emph{Boolean labeling} consist of two disjunctive values (e.g., True/False, \label{fig:protocol} \end{figure} -The \emph{Scale labeling} ranks the images on a scale (e.g., 0-10, 1-5, A-F). This could be considered to be an extension of Boolean labeling (which would be seen as a scale of 0-1) by adding float values in between. +The \emph{Scale labeling} ranks the images on a scale (e.g., 0-10, 1-5, A-F). This could be considered to be an extension of Boolean labeling (which would be seen as a scale of 0-1) by adding float values in between. The scope of this study will cover boolean labeling only, but scale labeling is included in future work. The data labeling implementation is detailed in Section~\ref{sec:impl_labelapp} \subsection{fastai} \label{sec:fastai} The AI implementation is using fastai, a deep learning library providing machine learning practitioners with high-level components creating state-of-the-art results in standard deep learning domains~\cite{fastai}. For the purpose of boolean labeling in this project, a single-label classification structure is implemented using various building blocks. The pictures and their labels are loaded into a \emph{DataLoaders} object. This object is responsible for maching labels with images, applying item transforms (transforms applied to each image individually) and batch transforms (transforms applied to each batch during training). It is also responsible of splitting the dataset into various sets: \emph{training, validation} and \emph{testing} (see Figure~\ref{fig:data_sets}). The training set is used to train a given model, which sees and learns from this data. The validation set is used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters. The evaluation becomes more biased as skill on the validation dataset is incorporated into the model configuration. Unlike the training set, the model only occasionally sees this data but never learns from it. The testing set is used to provide an unbiased evaluation of a \emph{final model fit} of the training dataset. -The DataLoaders object is then combined with a model and a metric to create a \emph{Learner} object. The model can be pre-trained, which means that some object and shape recognition can be used as a foundation to train a model for a more specific computer vision problem. This method is called \emph{transfer learning}. The Learner object has a bunch of methods including: \texttt{fine\_tune}, \texttt{predict} and \texttt{export}. The \texttt{fine\_tune} method first freezes all layers except the last one for one cycle (a ``prequel'' epoch), and then unfreezes all layers before running the epochs. This process of freezing and unfreezing layers in the Convolutional Neural Network improves the performance of transfer learning. So using \texttt{fine\_tune(2)} would first run a cycle only adjusting the last layer, then run 2 epochs adjusting all layers. The \texttt{predict} method is simply loading a single image into the model which then predicts the label. This is usually done after the training to sample the accuracy of the model. The \texttt{export} method saves the trained model to a file. +The DataLoaders object is then combined with a model and a metric to create a \emph{Learner} object. The model can be pre-trained, which means that some object and shape recognition can be used as a foundation to train a model for a more specific computer vision problem. This method is called \emph{transfer learning}. The Learner object has a bunch of methods including: \texttt{fine\_tune}, \texttt{validate} and \texttt{export}. The \texttt{fine\_tune} method first freezes all layers except the last one for one cycle (a ``prequel'' epoch), and then unfreezes all layers before running the epochs. This process of freezing and unfreezing layers in the Convolutional Neural Network improves the performance of transfer learning. So using \texttt{fine\_tune(2)} would first run a cycle only adjusting the last layer, then run 2 epochs adjusting all layers. The \texttt{validate} method is simply running predictions on a set of data and comparing the predictions to the actual labels. This is done after the training to sample the accuracy of the model on a testing set. The \texttt{export} method saves the trained model to a file for future use. \begin{figure} \includegraphics[width=.5\textwidth]{img/data_sets.png} @@ -140,7 +135,6 @@ The DataLoaders object is then combined with a model and a metric to create a \e \end{figure} \section{Implementation} \label{sec:implementation} -The code of all tools in Project Hentai AI is open source and can be found at \url{https://git.hentai-ai.org}. \subsection{Data Preparation} \label{sec:datatfms} The following section goes through the implementation of homogenizing the dataset, including renaming, changing extensions and cropping the images. @@ -182,13 +176,12 @@ The name of the label application is ``Hentai Tinder'' (see Figure~\ref{fig:tind The code is open source and can be found at: \url{https://git.hentai-ai.org/?p=hentai-tinder.git/.git} \subsection{Deep Learning with fastai} \label{sec:impl_deeplearning} -The deep learning framework (fastai) was implemented using interactive python notebooks running on Google Colab\footnote{\url{https://colab.research.google.com}} connected to Google Drive\footnote{\url{https://drive.google.com}} for storing csv-files, dataset and trained models. +The deep learning framework (fastai) was implemented using interactive python notebooks running on Google Colab\footnote{\url{https://colab.research.google.com}} connected to Google Drive\footnote{\url{https://drive.google.com}} for storing csv-files, dataset and trained models. The dataset is loaded using the pandas library from the csv-file generated by Hentai Tinder. Then the dataset is split into a 8:1:1 set of training, validation and testing. Transforms are specified for the training and validation set in the dataloader, which is then added to the learner. The learner uses the dataloader and a downloaded resnet34 for the training. The model is trained for 20 epochs and saved to a file. The testing set is then used to get the final accuracy measurements. The notebook is open source and can be found at: \url{https://git.hentai-ai.org/?p=waifu-notebook.git/.git} -% TODO add to git \section{Results} \label{sec:results} \subsection{Justifying Additional Transforms} -One of the main observations when training on such a small dataset was the tendency to overfitting. There are two types of transformations applied to the dataset before training: \texttt{item\_tfms} and \texttt{batch\_tfms}. The item\_tfms for this implementation is using \textit{RandomResizedCrop} which will crop every image randomly to 224x244 with a minimum scaling of 0.75. The batch\_tfms is applying many more tranformations to images in batches between each epoch. These transformations include: zooming, flipping, rotating and changing the brightness. Figure~\ref{fig:wobt} shows how \emph{only} item\_tfms transform the dataset. Figure~\ref{fig:wbt} shows how batch\_tfms additionally transforms the dataset further. Figure~\ref{fig:btgraph} shows the batch\_tfms's effect on error\_rate, train\_loss and valid\_loss. +One of the main observations when training on such a small dataset was the tendency to overfitting. There are two types of transformations applied to the dataset before training: \texttt{item\_tfms} and \texttt{batch\_tfms}. The item\_tfms for this implementation is using \textit{RandomResizedCrop} which will crop every image randomly to 224x244 with a minimum scaling of 0.75. The batch\_tfms is applying many more tranformations to images in batches between each epoch. These transformations include: zooming, flipping, rotating and changing the brightness. Figure~\ref{fig:wobt} shows how \emph{only} item\_tfms transform the dataset. Figure~\ref{fig:wbt} shows how batch\_tfms additionally transforms the dataset further. Figure~\ref{fig:btgraph} shows the batch\_tfms's effect on error\_rate, train\_loss and valid\_loss. The loss for both the training and validation set should be as low as possible. A high loss indicates a failure to generalize the learning. We see in the bottom most graph in Figure~\ref{fig:btgraph} that the validation loss is dramatically decreased using batch transforms. %\begin{figure} % \fbox{\includegraphics[width=.45\textwidth]{img/overfitting.png}} @@ -215,8 +208,7 @@ One of the main observations when training on such a small dataset was the tende \end{figure} \subsection{Error Rate of Thighs} -The dataset containing 1000 images was labled using Hentai Tinder (Section~\ref{sec:impl_labelapp}) by three individual persons: User A, User B and User C. A table of the training result after 20 epochs for each user can be seen in Table~\ref{tab:user-train}. The three different users had varying rates of approval on the dataset with user C liking almost half of the dataset. The lowest error\_rate observed came from the dataset labled by user B. With the error\_rate being close to the rate of approval, a sanity check with a confusion matrix showed that the model did not just predict false on the whole dataset. -In Table~\ref{tab:user-test} we show the true/false positive/negative results on the testing set for each user. Furthermore we show the accuracy on the testing set using: +The dataset containing 1000 images was labled using Hentai Tinder (Section~\ref{sec:impl_labelapp}) by three individual people: User A, User B and User C. A table of the training result after 20 epochs for each user can be seen in Table~\ref{tab:user-train}. The three users had varying rates of approval on the dataset with user C liking almost half of the dataset. With the error rate sometimes being close to the rate of approval, a sanity check with a confusion matrix showed that the model did not just predict true/false on the whole dataset or started overfitting. In Table~\ref{tab:user-test} we show the true/false positive/negative results on the testing set for each user. Furthermore we show the accuracy on the testing set using: \begin{equation} \frac{TP+TN}{TP+TN+FP+FN} @@ -249,18 +241,18 @@ In Table~\ref{tab:user-test} we show the true/false positive/negative results on \section{Discussion} \label{sec:discussion} \subsection{Limitations} \label{sec:limitations} -The size of the lewd anime thighs dataset is only 1000 images. This leads to overfitting on the training or the validation set which can be mitigated slightly by applying transformations. The small dataset is due to the time-consuming task of manually cropping and labeling the dataset. Since the model is trying to learn an individual's taste, that individual must label the full dataset. +The size of the lewd anime thighs dataset is only 1000 images. This leads to overfitting during training which can be mitigated slightly by applying transformations. The small dataset is due to the time-consuming task of manually cropping and labeling the dataset. Since the model is trying to learn an individual's taste, the individual must label the full dataset. \subsection{Future Work} \label{sec:futurework} In order to increase the size of the dataset and thereby obtaining a more robust accuracy from the machine learning model, future research in Project Hentai AI will spend some more focus on automating the collection, transformation and labeling of data. -In this study, only boolean labeling was considered when reviewing lewd anime thighs. But even in the world of Hentai thighs are more often than not in a gray-zone as opposed to black or white. A future work in wAiFu would be to extend the labeling application (\emph{Hentai Tinder}) to have a mode or a version capable of using rate labeling on a scale. This could be as easy as presenting the user with a 5-star system, similar to reviewing restaurants or hotels, where each image gets rated from 1-5. +In this study, only boolean labeling was considered when reviewing lewd anime thighs. But even in the world of Hentai, thighs are more often than not in a gray-zone as opposed to black or white. Future work would be to extend the labeling application (\emph{Hentai Tinder}) to have a mode or a version capable of using rate labeling on a scale. This could be as easy as presenting the user with a 5-star system, similar to reviewing restaurants or hotels, where each image gets rated from 1-5. -As metioned in Section~\ref{sec:datacollection}, the dataset mainly contained lewd feminine thighs. One area of future work would be to investigate the masculine/feminine feature ratio effect on the model accuracy robustness. +As metioned in Section~\ref{sec:datacollection}, the dataset mainly contained lewd feminine thighs. One area of future work would be to investigate the masculine/feminine feature ratio effect on the model accuracy robustness. In other words using other/larger datasets. \section{Conclusion} -In Project Hentai AI: wAiFu (Witty Artificial Intelligence Framework Utilization) we established a framework for processing and labeling data using our own newly developed tools. We then with fastai re-trained a Convolutional Neural Network to classify images of lewd anime thighs based on subjective ratings from three individual users with an accuracy over 70\%. Even though batch transforms where applied to mitigate overfitting, we believe that the dataset could still be too small. The size of the dataset is impacted by the pre-processing overhead (cropping) of the general dataset images, as well as the manual labeling for each new user. +In Project Hentai AI: wAiFu (Witty Artificial Intelligence Framework Utilization) we established a framework for processing and labeling data using our own newly developed tools. We re-trained a Convolutional Neural Network using fastai to classify images of lewd anime thighs based on subjective ratings from three individual users with an accuracy of 70\%, 78\% and 97\%. Even though batch transforms where applied to mitigate overfitting, we believe that the dataset could still be too small. The size of the dataset is impacted by the pre-processing overhead (cropping) of the general dataset images, as well as the manual labeling for each new user. \section*{Acknowledgement} We would like to thank Kittey for coming up with the name of the project: \emph{wAiFu}. We would also like to thank Hood Classic\#0148 for coming up with the name of the labeling app: \emph{Hentai Tinder}. Finally we thank the three anonymous users in this study.