Architecture of microservices Vector

GitHub Repo https://github.com/kappaquindici/AI-Document-Processing-Platform

kappaquindici/AI-Document-Processing-Platform

AI Document Processing Platform is a cloud-native system designed to process, understand, and semantically match large volumes of documents using modern AI techniques. The platform combines Large Language Models (LLMs), vector embeddings, and microservice architecture to enable intelligent document processing pipelines.

GitHub Repo https://github.com/Tanny1810/DocQuery

Tanny1810/DocQuery

Production-grade document ingestion and Retrieval-Augmented Generation (RAG) system demonstrating scalable backend architecture using FastAPI, message queues, and dedicated workers. Supports async processing, S3-based storage, text chunking, embeddings, vector search, and clean separation of concerns.

GitHub Repo https://github.com/reddyprasade/Machine-Learning-Interview-Preparation

reddyprasade/Machine-Learning-Interview-Preparation

Prepare to Technical Skills Here are the essential skills that a Machine Learning Engineer needs, as mentioned Read me files. Within each group are topics that you should be familiar with. Study Tip: Copy and paste this list into a document and save to your computer for easy referral. Computer Science Fundamentals and Programming Topics Data structures: Lists, stacks, queues, strings, hash maps, vectors, matrices, classes & objects, trees, graphs, etc. Algorithms: Recursion, searching, sorting, optimization, dynamic programming, etc. Computability and complexity: P vs. NP, NP-complete problems, big-O notation, approximate algorithms, etc. Computer architecture: Memory, cache, bandwidth, threads & processes, deadlocks, etc. Probability and Statistics Topics Basic probability: Conditional probability, Bayes rule, likelihood, independence, etc. Probabilistic models: Bayes Nets, Markov Decision Processes, Hidden Markov Models, etc. Statistical measures: Mean, median, mode, variance, population parameters vs. sample statistics etc. Proximity and error metrics: Cosine similarity, mean-squared error, Manhattan and Euclidean distance, log-loss, etc. Distributions and random sampling: Uniform, normal, binomial, Poisson, etc. Analysis methods: ANOVA, hypothesis testing, factor analysis, etc. Data Modeling and Evaluation Topics Data preprocessing: Munging/wrangling, transforming, aggregating, etc. Pattern recognition: Correlations, clusters, trends, outliers & anomalies, etc. Dimensionality reduction: Eigenvectors, Principal Component Analysis, etc. Prediction: Classification, regression, sequence prediction, etc.; suitable error/accuracy metrics. Evaluation: Training-testing split, sequential vs. randomized cross-validation, etc. Applying Machine Learning Algorithms and Libraries Topics Models: Parametric vs. nonparametric, decision tree, nearest neighbor, neural net, support vector machine, ensemble of multiple models, etc. Learning procedure: Linear regression, gradient descent, genetic algorithms, bagging, boosting, and other model-specific methods; regularization, hyperparameter tuning, etc. Tradeoffs and gotchas: Relative advantages and disadvantages, bias and variance, overfitting and underfitting, vanishing/exploding gradients, missing data, data leakage, etc. Software Engineering and System Design Topics Software interface: Library calls, REST APIs, data collection endpoints, database queries, etc. User interface: Capturing user inputs & application events, displaying results & visualization, etc. Scalability: Map-reduce, distributed processing, etc. Deployment: Cloud hosting, containers & instances, microservices, etc. Move on to the final lesson of this course to find lots of sample practice questions for each topic!

GitHub Repo https://github.com/asadali08527/document-based-gpt

asadali08527/document-based-gpt

This is a project for a document-based GPT system that allows users to upload documents, and later query these documents. The solution is built using a microservice architecture, and employs a vector store (FAISS) for document embeddings, allowing fast retrieval of relevant content based on user queries.