Projects
Two use cases apply NLP analytics to streamline labor-intensive financial data processing. The first use case addresses the challenge of comparing financial reports across different companies, which is complicated by varying report styles. To overcome this, the solution involves automatically extracting data from digital reports and mapping it to a universal template using a TF-IDF weighted KNN-clustering mechanism. The second use case focuses on the sales team’s need to compile risk profiles from annual reports to recommend suitable financial products. By extracting information from annual reports and utilizing Generative AI techniques to complete standardized questionnaires, this approach aims to simplify and expedite the creation of risk profiles.
The Singapore Exchange (SGX) operates as a multi-asset exchange and a key hub for trading and investment, providing critical market data such as corporate actions, announcements, and trading information that listed companies must disclose under SGX regulations. This data is valuable for investment, risk management, and strategic decision-making. The problem was to assess the viability of using SGX’s liquidity-impact news as an input feature for forecasting aggregated customer balance movements. To address this, I conducted news coverage, periodicity, and efficacy analyses to evaluate the integration of SGX’s XML data feed as a supplementary source for cash flow forecasting in liquidity management scenarios.
Implemented and fine-tuned four use cases for liquidity management and financial instrument price auction forecasting, optimizing the client’s daily trading operations. Delivered comprehensive insights via a Tableau dashboard. Integrated an enterprise-level model monitoring solution with alarm-based thresholds, enabling proactive responses to model degradation and ensuring operational efficiency through timely adjustments
Migrate an existing Ionic-based web app that monitors and controls equipment maintenance activity into an in-house Angular-derivative web framework with an estimated 4000 daily users across 5 Fabs. This allows user to access to the application not only via tablets but also their computers.
Each processing stage can be broken down to smaller steps whose durations can be as short as a few seconds, and subtle yet serious defects only occur during this specific time window. Their univariate temporal data is collected from multiple sources: temperature, pressure, position w.r.t. origin, etc. There are two typical challenges when dealing with this data type: the time-shift nature of time-series and run-to-run gradual offset change. To resolve them, we applies Fast Fourier Transform profiling, Dynamic Time Warping, and Functional Data Analysis for different applicable use cases. Two projects had passed POC stage (estimated per annum cost reduction of $4.3 M.) and one was being reviewed for upcoming integration to company's smart-band control system. Also, I applied post hoc analysis to correlate tools' sensors and inline quality measurement to identify machine mismatch issues to max out the product throughput.
It's labouring doing manual inspection inside a tool, but its clean condition is crucial to ensure no inherent damage for incoming batch of wafers. Started as a request to install trigger-as-demand camera to ease tool owner's responsibility, we enhance further with a schedule-based anomaly object detection that almost lift it off from their daily tasks. The solution used a UNet model to reconstruct tool's interior and the residual image capturing the abnormal "noise", e.g. dust or material waste left over. I also derived augmentation strategy to ensure the model is robust dealing with different tool types worldwide. Estimated per annum cost reduction of $4.2M.
Our study identified some machine degradation events are correlated with human detectable sound patterns within the processing tool, hence potentially capture corrective and predictive maintenance. We conducted two Proof-of-concepts at two monitoring levels: (1) tool-agnostic acoustic statistical-summary-based limit control and (2) tool-specific anomaly detection with repetitive pattern that supports tool diagnostic by leveraging on hybrid training paradigm. Architecture-wise, we implemented a framework to unify acoustics pipelines and provide flexibility to integrate future solution into the network. I was also the project lead in migrating the solution to Google Cloud Platform and worldwide fan-out. Estimated per annum cost reduction of $1.1M.
Deep learning is currently receiving considerable attention from the
machine learning community due to its predictive power. However,
its lack of interpretability raises numerous concerns. Since neural
networks are deployed in high-stakes domains, stakeholders expect
to receive acceptable human interpretable explanations. We explain
the decisions of neural networks using layered explanations: we use
influence measures in order to compute a numerical value for each
layer. Using layerwise influence measures, we identify the layers
that contain the most explanatory power, and use those to generate
explanations.
The state of Singapore employs a unique large-scale public housing program, accounting for over 80 percent of its residential real-estate. In addition to providing a social benefit to its citizens and permanent residents in the form of subsidized housing, Singapore uses its housing allocation program to ensure ethnic diversity in its neighborhoods; however, limiting people's ability to freely choose apartments incurs some welfare loss. Our work studies this problem via the computational economics lens.
Vietnamese and Korean are a language pair sharing many common semantic concepts that are exploitable and useful to develop a good statistical machine translation system. In light of this, I created a mapping table for modality concept, say, a verb can imply suggestion, politeness, or social position of speaker to listener; and incorporate it into the data preprocessing pipeline. This is a collaborative project of CLC lab, KLE lab, and SYSTRAN company.
With available WordNet annotated corpus in English side, one can use aligments provided by GIZA toolkit to project the WordNet tags into Vietnamese side. Once this goal achieved, the new generated corpus is expected to contain many useful semantic information such that the SMT system could reach a better performance. The problem is that alignments vary in many forms: 1-1, 1-n, m-1, and m-n. Thus, we proposed some heuristics in combining ovelapped alignments in order to obtain the best projection result, which is then evaluated on a hand-labeled test set.